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CHAPTER I 


INTRODUCTION 


1.1 MOTIVATION 

When two or more events cooperate to perform a taskc 

k 

their communicatinss ability decides their success. 
Communication is eoually important whether the asents are 
all people? or all programs or some of each. 

Since the inception of computers as information 
systems? trials have been made to improve communication 
between prosJrams that retrieve information from large 
amounts of data? and the people who need this information. 
Man-machine communication using language caters to broader 
needs than other schemes like pointing? menu selection 
scheme? drawing? etc. 

Natural Language Interface to a database is a program 
that accepts aueries in a Natural Language (like English) 
and generates -a formal euery for accessing the giyefi 
database. ^A ouery processor processes the formal euery ah<^ 
generates the response which is presented to the user. 
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A tyj^ical example is in manasement information systems 

where a user wishes to extract information from a database 

to make decisions* Ideally^ he wishes to interact in his 

own terminolosy » rather than consult a programmer* To help 

such a userr we need a Natural Landuase Interface to 

database* Another example is when a casual user wishes to 

► 

find information from a database* Examples of this 
situation are* 

a reader in a libraryi* a visitor to a firm* 

Most of the existind NLI systems are very finely tuned 
to the specific domain of their discourse* Portind them to 
a new database is as difficult as redesidnind the system for 
a new database* This motivated us to desidn a system that 
comprehends the deneral principles of Natural Landuade 
Interface to databases and eases the portability to new 
databases* We aimed at desidnind a Natural Landuade 
Interface system that takes database specification as a 
parameter* To transport the system to a new d3t3bsse» only 
the new parameters have to be supplied* 

1*2 OBJECTIVE 

The objective is to build a system that accepts a land® 
subset of Enslish queries on its domalrn of jafiscourse and 
convert them into formal aueries* At the same timei-the 



system should be easily portable to new database 


There 


are four aspects which clearly define the capabilities of a 
Natural Landuade Interface* They arej^ the potential users 
of the system » the portability of the system to a new 
databases' the landuade. accepted » and the constraints on the 
database structure. Ue shall formulate our objectives in 
terms of these four aspects* 


1*2*1 USER 

The system is aimed at a novice or casual user* He is 
not expected to know the ordanisation of the database* This 
constrains the landuade accepted to be close to Endlish 
rather than a command landuade. 


1*2*2 PORTABILITY 

The system is aimed to be independent of any particular 
database* 

When the system is to be ported to a new database » th® 
desidner specifies the perameters of the new database* We 
propose to formalise this database specif ication and 
sldorithmise the translation of these sjrec if icat ions into 
system pefametefs*:! 
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Natural Lan^ua^a Interface we rarely mean the entire 
syntax of the Natural Lansluisae* Instead^ we propose to 
design the system for a reasonably larae subset of Natural 
LsnauBSe. Alsoj- the system accepts minor errors in 
swnta>{< like disagreement in "case* and "number" )♦ The 
lansfuaSe should enable the user to learn auickly what tapes 
of constructs are accepted and what others are not* 

The system does not attempt to understand the Query nor 
to capture the full semantics of the Query* Insteadi» the 
knowledse associated with certain database specific words of 
the Query and syntactic structure of the Query contribute to 
the conversion from the EnSlish to Formal Query* (ftlthoush 
the system does not understand the Query? we claim that the 
syntactic regularity and the database specific word® 
tosether determine the formal Query almost unambiSuosly ♦ 

1*2*4 DATABASE ' 

. Ue assume that a relational database with all it® 

I ■ . : 

relations? fields? and the keys is siven* We propose to 
deaisn the system for any relational database i unlike , some 
systems Uike tUNAl? see section 2*2*i) which assume a flat 
file as database* 
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1*3 ORGANISATION OF THE THESIS 

Chapter 2 ^ives a brief survey of Natural LanSua^e 
Interface systems and points out their drawbacks viewed in 
the lisEht of our objectives* It also introduces the basic 

formalisms of Natural Lanauade parsers* Chapter 3 gives an 

!■ 

overview of our system which has four phases* Chapters 
4»5i'6 discuss these phases in detail and chapter 7 contains 


the conclusions 



CHAPTER 2 


SURVEY OF RELATEIi SYSTEMS AMD ATMS AS PARSER 

2*1 INTRODUCTION 

A brief survey of relevant natural lanSuaSe systems 
followed by a brief introduction -to Augmented Transition 
Networks is the subject of this chapter 

2*2 NATURAL LANGUAGE SYSTEMS 

The Natural LanduadeC henceforth NL) Interfaces 
discussed here are PLANESr RANDEZVOUSs- LUNAR and LADDER* 
Each of the systems is designed for a specific data base* 
We eK3mine)» in brief f how each system works and identify the 
basic principles of each system* 

Ue shall appraise each system by its adaptability to a 
new data base 5 this is achieved by ekamininat "the 
sensititvity of the principles adopted in each system to # 
chansse in the specific data base for which it| is designed* 
The second aspect for appraisal is the habitabijlity of the 

I 

lansuade accepted by each* These parameters)' chosen for 

i' . _ 

\ 

appraisali' are the prime objectives of our system? they 
help us to view each system in the lisht of our objectives* 
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PLANES is NL Interface for answerinsf Queries on a data 
base of aircraft flisht and maintainance records* 

Normal operation of F’LANES entails three steps? namely? 
parsing concept case frame Generation? and Query Generation* 

While parsiriG? a first pass performs spellinG 
correction? substitues roots and inflection markers for 
inflected words and removes noise phrases? a second pass 
transforms the Query into a set of unordered semantic 
coristituents< 1 ike plane-type? date? and time)* This is 
accomplished by usinG a set of subnets <ATNs) each of which 
recoGnise a semantic constituent* 

The second step compares the unordered . semantic 
constituents with standard concept case frames* These 
concept case frames are templates consistinG of a seeuence 
of semantic constituents* The matched template or the 
concept cas.e frame is issued to the next phase* In case 
some constituehts are missinG from the standard tempistes* 
the reQuired constituents are borrowed from the context set 
by the previous Queries? thus meetinG the needs of ellipsis* 
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The third pess interects with the DB end aenerstes the 

Query from the concept esse frame* A key findinsS of PLANES 

is that a set of semantic constituents uniouely decide the 

formal Query* UsinsS this principles it fconverts the concept 

case frame (a template of semantic constituents) into a 

% 

formal Query* 

PLANES is not designed to adapt to a new data bases in 
facts it uses many assumptions on its domain of discourses 
shscklind itself heavily (too heavily) to the specific 
domain of aircraft data base* 

The transformation of an unordered set of semantic 
constituents into a formal Query makes an ill organised 
approach to the prepositional modifiers? this principle 
fails not only in other domainss but even in its own domain 
as illustrated below* 

The subnets of PLANES recoSnise the word "hours" as the 
semantic constituent "f lisht-hours" and the word "NOR" op 
"NOR hours* as the semantic constituent 

"not-oprationslly-ready-hours* * So in the following Querac 
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*♦♦♦* hours of NOR**t«" 

PLANES recosSnises two semantic constituents while the user 
intends onlw one* 

Similarly y the lansiuasie accented^ by PLANES constitutes 
a small subset of NL because it does not capture the 
syntactic resSularity of the lansSuasSe. For a new data basey 
with a larder subset of NLf it becomes very difficult to use 
the principles followed in PLANES* 


2*2*2 RANDEZVOUS 

RANDEZVOUS is a NL interface system for relational data 
base dealind with suppliers^ parts^ shipments and projects* 

The first Phaser the analyserr converts the user's 
Query into formal euery landuade DEDUCE* This conversion 
uses a set of rewrite rules which map a specific input 
pattern into a fradment of DEDUCE ouery* The conversion is 
bottom UP and non-backtrackind* The analy^ser resolves 
ambiduities by a dialodue with the user usind the routine 


called menu-driver 



The second F'hesey the deneraton- restates the DEDUCE 
frasEments in erecise English and asks user''s aFF'Toval* On 
failure* the menu driver takes over and the erocess reF'eats* 
on success* the Generator passes the "DEDUCE frasSments to the 
third Phase* 

The third Phase* the retriever* actually interacts with 
the data base and retrieves the answer* 


RENDEZVOUS runs mainly on the rewrite rules* These 
rewrite rules are deterministic and non-hacktrackins 
resulting in a typically lonsS clarification dialogues for 
even simple Queries* 


Designed with the ^oal of acceptins any English Query* 
RENDEZVOUS captures little • syntactic regularity af the 
<3uery*This results in its failure to answer certain Quries 
correctly* Queries involving more than two linking Joins 
constitute a typical example* 


It is not designed to adapt to a new data, base* The 
rewrite rules are semantic-grammar based* They translate 
matched input patterns into DEDUCE fragments* But when it 
is to be ported to a new data base* all these rewrite rules 


have to be replaced* 


this involves a complete redesign* 
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2*2»3 LUNAR 

LUNAR is NL interface system for NASA data base of 
chemical analysis data on LUNAR rock and soil composition* 

The first phase i' an ATN based parser » produces a 
syntactic structure of the input oueryJ it is efficient to 
map many surface forms into thSSe same parse* 

The second phases the semantic interpreter» maps 
specific syntactic structures into fragments of a formal 
Query* The rules used in this transformation are determined 
from the verbs r the nouns and the determiners of the input 
Query* Each lexical c3teSory< 1 ike verb? noun) alonS with a 
specific syntactic structure (like noun phrase? verb phrase) 
in which they can occur? together define a set of rules* 
These rules? derived from the semantics of the data base map 
the syntactic structures into formal Query fradments* 

The third phase? the Query senerator? retrieves the 
answer usinS formal Query sSenerated by the second phase* 

LUNAR uses certain principles that result in a general 
interface* To that effect? it encapsulates syntactic 
regularity? , ancS ^ modularises' .. twntacfic to, semani'ic'’ 
transfcrisetioh*: ■■ But? it ^ frog . the'- following 
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problems ♦ 

The data base used for LUNAR is s flat file* 
Conseauentl'rfi* it can not be ported to a data base which 
contains entitw relations and linkinS relations* Such data 
bases involve identifyins the particular relation of a 
recognised field? perf oriiiinsi operations like selection? 
projection? and Join* These operations ere not accounted 
for? at all? in LUNAR* 

In LUNAR? no s indie element can appear in more than two 
domainsdike phases of analysis? chemical constituents? 
units of measure? etc*)* So? it does not have the problems 
of findind an implied relationship between one noun and 
another that oualifies it* If there are two nouns and their 
domains are found? then the relation between them is fixed 
and unieue* Such a restriction severs the set of data bases 
which can adopt the principles of LUNAR* This is because? 

* i 

in many data bases? a sindle field can occur in more than 

one relation? even if the relations of a pair of fields are 

identified? the way they are related to each other? through 

other fields is not unieue* Such intelligent access of 

relations Is not considered!’ all? in LUNAR* j So? it can 

not foe ported to any practical date base* 
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LUNAR keeps every proper noun in its lexicon* Since 
the different types of entries is small r it could use such a 
scheme* If we contrast it with the data base of personnel 
records bavins names of personsv street and lane addresses y 
we realise how significantly the ‘ size of the lexicon 
increases* This is another restriction in applying the 
approach followed in LUNAR* 


2*2*4 LADDER 

LADDER is designed to answer oueries on US Naval 
command and control date base* 

The followins is a brief description of how LADDER 
converts its English euery into a formal ouery* 

The first phase? INLAND? takes in restricted English 
Queries and produces skeleton Queries* It identifies th^ 

fields of the data base that are specified in the Query* It 

# * ’ 

does not mention the various relations to be accessed? Joins 
to be made end other such operations to be performed* 

The second component? IDA? breaks the skeleton Queries 
into sub Queries indicating the relations to; be accessed? 
and the linkinsf id be #ade* These suboueries are passed to 



the next phese 


The third pheser FAMi' decides what files to access and 
finally retrieves the answers from the data base* 

LADDER uses LIFER as the parser* To appraise LADDER'S 
linguistic power? we need to know the principles of LIFER* 

LIFER is a semantic-Srammar ba'sed parser Generator* It 
is independent of any particular data base* It consists of 
a set of interactive functions for specifying the lansuasfe 
fraament to be used for a specific application* These 
interactive functions define productions of the following 
format* 

<met3-syfflbol> «!»»=«> <P3ttern>i» <expression>* 

where <meta--symbol> is a meta-sytnbol of the lansfuajge? <liK@ 
semantic constituents of the data base for which it is 
used)? <pattern> is a pattern of input lanauade symbols aroi 
meta-symbols and <expression> is a LISP expression whose 
value? when computed? dives the value to be assigned to 


<meta-symbol> ♦ 
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The input is parsed left-to-ridht? top-down* Whenever 
3 mets symbol is recognised y corresponding parse tree is 
constructed and the <expression> associated with the 
<met3~symbol> is evaluated and put alond with the 
<meta""symboI> in the parse tree* WheTi the tree is finally 
completed^ the value of the meta-symbol representind the 
root of the tree is the formal rauery* LIFER'S meta symbols 
are not syntactic structures ( like noun phrase » verb 
Phrases)* Instead? they are semantic constituents of the 
specific date bese<like shipnamer ship-property in the case 
of LADDER)*' LIFER encourades the interface desidner to 
specify a landuade which encodes the syntax of the input 
landuade into the db-constituents* 

We shall now see the problems involved in the 
portability of LADDER* These problems are inherent in anw 
semantic-drammar based approach* Hence they are carried to 
LADDER because it uses LIFER - a semantic parser* 

LIFER does not carry the description of landuade fro® 
one application domain to anotheras it does not capture the 
syntax at all. So whenever a data base is chanded? a new 
set of semantic productions are- to be desidned* This is no 
mean a task and no lass than a total redesidn* 



2-11 


The design of the semantic rules for a new data base 
forces the designer to map input lansuade patterns into 
formal euery fradmerits+ This encoding of input to output 
landuade is done in a sindle step by the productions* This 
reouires hish level of expertise in understanding the 
content of the data base and the structure of the potential 
English patterns)' hindering an easy switch to a new data 
base* As the input lansluaSe is expanded » these productions 
become difficult to conceive and complex to implement and 
their adhoc character eventually renders them unusable* 

Since little attention is paid to syntax? the landuade 
accepted is a small subset of NL and any extension? as 
discussed above? is difficult* 

From this survey we can draw the following conclusions* 

Each of the systems discussed above are designed for a* 
specific data base? their tolerance to a chanSe in data 
base is small* 

PLANES and RANDEZVOUS are Just Slued to their 
respective data bases while LUNAR and LADOER us® some 
principles that are General and independent of any data base 
in particular* However? LUNAR' s assumptions on its 
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simple<too simple) dais base do not warrant portability to 

ana practical data basei- while LADDER'S semantic-srammar 

based parser renders it a poor and inextensible landuade* 

More over a chanse in the data b3se»' in LADDER^ involves a 

complete redesign of the semantic productions* These 

problems render both the systems unadaptable to a new data 

» 

base* 


Nevertheless? LUNAR's language power is high* This is 
mainly because it captures the syntactic regularity of the 
language thus enabling it to accept a large subset of NL* 
Similarly? LADDER'S data base access is powerful* It 
interprets missing Joins by using a schema derived from the 
data base*. Also it makes intelligent access to data base* 

Our aim? as formulated in chapter 1? is to design a 
system that has both intelligent data access and habitable 
NL power* Added to this it is expected to be easily 
portable from one data base to another* The extent to which 
we met these aims is discussed in chapter 7* 
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2 *3 ATNS AS PARSER 

Augmented Trasition Networl<.< ATN) is 3 fornislism to 

psrse NL+ Its main appeal is its simplicity? which made it 

one of the most common methods of parsing in many systems of 

% 

NL interf3ce< 1 ike LUNAR? PLANES)* We use ATNs to recodnise 
the field descriptions and to parse the euery* 

Ue first describe transition networks and then see how 
audmented transition networks are built from them. Finally 
we dive the way our formalism deviates froffi the standard 
formalism. 


2.3.1 TRANSITION NETWORKS 

A transition network is a finite state machine. The 
arcs of a transition network indicate a linduistic catedory 
to which the current input word should belond. 

An arc is traversed if the current word of the infUt 
sentence belonds to the catedory indicated on the arc. When 
one arc is traversed? exactly one word is "consumed". When 
the final arc "done" is encountered? the sentence should 
have been completely consumed. Else transition network 
returns a failure indicatind that the sentence can not be 
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accepted* 

The Fig 2*1 shows 3 simple trsnsition network which 
accepts simple sentences* Consider “"the dog bites the cat"* 
The arcs that match the input are li»2»4>5i'6* 

2*3*2 AUGMENTED TRANSITION NETWORKS 

A Transition Network augmented with two facets evolves 
into Augmented Transition Network( ATN> * The first one is 
the introduction of the concept of a “Register* » the second 
one is extended meaning of 'arc** 

Each ATN is associated with a set of regisers* A 
register is an ASSOC list of the following format* 

<<l3bel> <v3lue> ) 

where <l3bel> is the name of the register and <v3lue> is the 
value set to the register* 

An arc in a Transition Network indicates onls lexical 
categoriesi like verbs*' nouns) whereas the arcs of a ATN can 
be a syntactic categories (like noun phrasesi' verb phrases) 
as well* The arc labelled wlt^ a syntactic category is 





A transaction cJC-TI/4 0I^K 
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traversed if the next few words match the ATN defining 
syntactic category* Another chansSe in the "arc* is 
introduction of explicit conditions* In the Transition 
Networks the only condition on an arc is that the current 
word should belond to the category ‘indicated by an arc* 
This is called implicit condition* The implicit condition 
in the arcs of an ATN is that the next word belongs to the 
lexical category (if the arc indicates a lexical catesory) or 
a syntactic cate£iory(if the arc indicates a syntactic 
category). Besides this implicit condition» an arc of an 
ATN may have explicit conditions which must be satisfied to 
traverse the arc* These explicit conditions are called 
simply "conditions** A third extension to the arc is 
■actions** In an ATNr when an arc is taken? the 

cor responding parse structure? if any? has to be built* To 
do so? each arc may have some "actions" which produce the 
reouired side effects to construct the parse structure* 
Thus? including these three changes? a typical arc of an ATN 
looks as below* | 

*<condition> <3rc-type> <3ctioris> <next-node>* 

where <arc-type> indicates whether the arc fecoSnises a 
lexical category or a syntactic category (or some other type 
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as discussed later) y <conditioris> is the explicit conditions 
to be satisfied for a successful traversal of the arcy 
<actions> is the actions to be carried on successful 
traversal y and <rie:!t“node> indicates* the the node where the 
control transfers after the arc is traversed* 

We now present a complete description of ATN formalism* 

1* It consists of a set of ATNs* One of them is 

'v 

considered distinguished and is normally siven the name "S“* 

2* An ATN is a 4-tuple < label y states y arcsy 

registers)" 
comprising of 

the name of the ATNy 

the set of states y 

the set of arcsy 

and the set of resSisters* 

The registers are of two typesy feature resistersy arid 


role registers 
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The feature registers are used to store the features of 
the words recognised ♦ For eKample» if an arc recognises a 
nounif then the actions of the arc inas set the feature 


resister 

"number” 

to 

"sinsular " 

or "plural" 

according 

as 

the 

word is s 

insular 

or 

Plural * 

Similarlvjy 

another 

feature 

register 

could 

be 

named "per 

son" which 

is set 

to 

the 


person ( 1st » 2nd!' 3rd) of the noun* On recoanisinsi the word 
"kicks" the followinsS feature rejSister set may be se*t to the 
appropriate values* 

(word kicks) 

(tense present) 

(voice active) 

(number plural) 

A role register contains!' as its valuf? are^isWf 
frame* A register frame consists of 

1* label indicating the name of the frameK 

2* a set of registers denoted fay their namei- value 
pairs* 

A ressister frame- is shown in FiS 2*2* The definition of 



resiister frame is recursive because a role register which is 


a constituent of a register frame contains? as its value? 
another register frame* 

For example? consider Fid 2*3 showind an ATM to 
recodnise a noun phrase* The first arc checks for "det" and 
second arc checks for a "noun"* When an arc is successful? 
the actions associated with it construct a redister frame 
consist ind of the feature redisters of that word* When the 
ATN is successfully traversed? the actions on the last arc 
construct a redister frame which contains each of the 
constituents recodnised* 

Let "the bos" be the input to the ATN uf Fid 2*3* When 
"the" is recodnised as a "det"? the relevant feature 
redisters are constructed as shown in Fid 2*4* 

Similarly? the role redister "noun* is also constructed 
with relavant feature redisters as shown in Fid 2*5* At the 
end of the ATN? the role redister consistind of the 
constituents recodnised are as shown in the Fid 2 * 6 * 



Dete’PMin&R 

WO R 0 

THE 

person 

s>\»4&il£ 


NOUN) 

WORP BO^ 
{juHseK suvjftte 
person THlRP 



Fic,e.6. Res-ste. eR/^me for nouw P»base 
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THE ARCS 

m 

The following srcs sre common in stenderd ATN 
formalism* 

1* The word &rct~ This is taken" when the current word 
is the same as the one specified in this arc* 

2* The CAT arc*- This arc is taken if the current word 
belonsis to the lexical category specified by this arc* 

3* JUMF-' arct- This arc is taken unconditionally* 

4* SEEK arc*™ This arc specifies a syntactic category 
which must be matched by the current word and next few 
words* Soil this arc is traversed if the ATNj^ correspondind 
to the syntactic category specified in the 3rc» is 
successful * 

5* SEND arct™ This arc returns^ with successi' the 
structure parsed in that ATN* The control does to the ATN 
which called it* 

We shall illustrate with an ej-femple* Fis 2*7 shows an 


ATN Grammar with two ATNs for simple sentences 



NP) 


(SEEK NP) 



(cfiT per) 


(cAt wouw) 



CCAT NOUis)) 


0 ..“ 7 » A rt) 
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The conditions on all the arcs sre kept "true" except 
on the arc 2* The condition on this arc checks whether the 
value of feature resister "number" of the role register "NP" 
of the previous arc asrees with the value of the feature 
reSister "number" of the present arc* This is to allow 
sentences of proper concord onla* 

Consider the sentence "boa kicks the ball"* 

The first arc in S is a SEEK to NP* So? the control is 
transfered to the ATN NP* It parses "boa" as a noun-phrase 
returninS the structure shown in F"iS 2*8* 

The next arc in S is a cateSora arc which consumes 
"kicks" and sets the feature registers as shown in Fis 2 * 9 * 
The condition on this arc is satisfied* Soi- this arc is 
traversed and we reach third arc* This is a SEEK are* So» 
control soes to the ATN NP and it parses "the ball* as a 
noun Phrase and returns the role resisterf shown in FiS 
2*10i' to the ATN S* By thenr the sentence is completely 
consumedir and the SEND arc of S returns the role resister 


shown in Fid 2*11 
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2*3*3 OUR FORMALISM 


Our formalism differs slidhtlu from the standard 

formalism to meet our needs more efficiently* Our system 

% 

does not "understand" the English ouery in order to convert 
it into formal Query* So? a deep structured parse is not 
needed* Detailed features like "number"? "person"? "gender" 
are also not needed* This allows us to avoid knowing every 
feature of each word* The only features we are intersted in 
are the lexical category of the word and the data base 
specific information? if any* This renders the feature 
register concept un necessary for our formalism* The 
structure to he built on recognising any word is 

( <lexical categorey> <d3t3 base specific information^' 
<word>) 

This structure is independent of <lexic3l c3tegory>? 
whereas in the standard formalism the structure depends on 
the <lexical c3tegory> For example in the case of "verb" a 
feature register "tense* is there whereas ih the case of 
"noun" a feature register "person" is there* In our 
formalism? there is ho such change in structure. To 
construct such fixed format? we do not need special "feature 



register" concept 


Similarly y the structure to be built st the end of any 
ATN is also standardised as follows* 

( <3tn--n3me> ( LIST of syntactic structures led to 
success) ) 

Such a structure can be built while traversing the ATN and 
we do not needy for such simple structure? a detailed role 
register concept* 

Another place where registers are used is in the 
<conditions> on the arcs of a ATN in the standard formalism* 
These conditions check the registers of the previous arcs* 
This is met in our formalism as follows* The conditions on 
arc in an ATN "can access the structures recognised by 
previous arcs of the ATN* So? they can access any struture 
of the previous arc? and check any fe3ture(only two features 
are there in our formalism as explained earlier)*- In fact? 
the conditions on the arc can access any structure of the 
over all parse tree? although such intei — ATN communication 
is seldom needed in our formalism* This is because? we are 
building a shal low structure unlike the standard ATN 
formalism whioh bgtlds deen structure* 
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The actions associated with any arc in the standard ATN 
formalism typically build the parse tree* In our formalism? 
the parse tree structure is defined independent of the 
category of the word end independent of the ATN? unlike the 


standard formalism* 

For ex 

ample y 

if 

a verb is 

recognised 

in 

standard formalism? 

there 

is 

one 

type of 

register frame 

constructed while a 

nother 

type 

of 

register 

register 

is 

constructed for a 

noun * 

In 

our 

formalism 

it is not 

the 


case* The parse structure is independent of the type of the 
word* Hence? we pushed "actions* into the interpreter thus 
avoiding explicit mention of these actions* When an 
3rc<that needs to build a parse structure) is taken? then 
the actions are automatical ly carried out by the 
interpreter* Thus? when a category arc is taken the 
following structure is automatically constructed* 

\ 

( <c3t""n3me> <d3ta base specific info> <word>> 

This structure is the same whatever Ccat n3me> is* Thus we' 
avoid explicit actions on the arcs* 

So? in our formalism? the structure of an arc is as 


fol lows 



< < F^re-condition!::- <neKt-nod©>) 
where ^pre-conditionis- in the simplest esse can be a 
arc-tyee* The pre-condition is EUALuated and on success (non 
NIL)» <riext -nodex- is reached* If there are conditions on 
the arcy then the <pre“Condition> looks as below* 

(and Ccondition 13 
C condition 23 
Ccondition 33 

<arc--type> ) 

This is valid because ? in our formal ism y the 
< 3 rC”types> are defined as LISP macros or functions y thus 
allowing the user to define his own arcs* Such a unified 
approach to <arc"type> and <conditions> avoids the 
unnecessary case checkins in the interpreterC ATN 
interpreter) and also facilitates an easy definition of arcs 
by the desisner* In the standard ATN formalismy if a new 
arc is to be definedy it has to be incorporated in the 
interpreter as another case statement* This increases the 
potential for errors in the interpretor and also increases 
case checkinsvi* In our formalismy what all to be done is to 
define a new function(or a macro) with the name of the arc* 


In case the arc has to build a parse treey then these side 



effects are incorr-'orated in this function definition itself* 
Ttiese side effects are carried out automaticalls when the 
arc <3 funnction or a macro) is EVALuated* 

We shall describe the arcs of our formalism* 

ARCS 

All of them have the following format by default* 

< <3rc~tyF'e> <arSument>) 

The arsiument is not evaluated unless otherwise stated* 
< 3 r 3 ument> is only one atom unless specifically mentioned* 

1* PARSE arc The <ar^ument> aives the name of the 
syntactic category into which next few words have to be 
recoEfnised* This is achieved by a call to the ATN which 
recosfnises the syntactic category* It is same as SEEK arc 
in the standard formalism* 

2* DONE arc ♦- This has the structure “(done)** This 

! 

arc is an indication of successful completion of the ATN. 

So» it completes the actions to be done oh successful 

! 

completion of an ATN* The structure thus formed is returned . 
to the ATN which called it* It is same as SEND, arc in the 
standard ATN formalism* 
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3* T ere This is 3 JUMP sre which is taken 

unconditionally ♦ It does not consume any inf'-ut* 

A* WORD arc ♦- This is taken if* the next word belongs 
to the cateslory which is indicated by <3r£fument> of the arc* 
The “catesiory* is lexical category or part ^of speech* ThiS 

is same as CAT arc* 

5* CHECK-FOR arc ♦- This has an indefinite number of 
<3rSument>s* It checks for the condition, that the next 
input word is the same as any of the words passed as 
<3r<i{ument.>s to it* If the condition returns success? it is 
taken* It does not evaluate the <ar^ument>s* Fi.d 2*12 
shows it* 


6* CHECK-LIST arc ♦- This evaluates its argument which 
should dive a list* Then it checks if the next word is a 
member of this list* If so? the arc is taken else? it fails 
to take the arc* It is shown in the Fid 2*13 *1 
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7 * SEEK arc I~ This takes two arguments? the first 
one is a number (an integer) and the second is a word(not 
evaluated)* The arc is a look ahead arc* It succeeds if 
the second arsiument^ namela the wordif is in the first n 
number of words of the input sentence* Here n is the 
integer specified with the first argument* Ei.=f 2*14 shows 
it* 


8* SEEK-LIST arc J- 

It is also a look ahead function* It has the structure 

"( seek-list list number)* 

This arc evaluates the first argument to Set a list* 
The arc succeeds if any word of the list so obtained is 
there in the first n number of words of the input sentence* 
Here n is determined from the number diven in the arc* Fid 


2*15 illustrates this 



CHAPTER 3 


SYSTEM OyERMlEW 

The prime objectives of our system? ss formulated in 
chapter 1? are acceptability of habitable NL and 
DB-independence* In this chapter? we see how an attempt to 
honour our objectives forces certain desidn considerations 
which inturn structure the system* The various modules so 
developed are identified and discussed in brief* A detail 
discussion of each module follows in later chapters* 

Xhe last section discusses the implications of 
acceptind a NL Query and forms the limitations of the NL 
accepted by our system* 

3.1 DIFFERENT APPROACHES TO NLI 

Almost all the NLI systems can be broadly classified 
into three categories according to the type of the draftmar 
they accept ♦ CRich843 They are 

<i) Landuade throudh windows? 


4 * 

'•HP 
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<ii) Semantic Grammarsc and 

* 

<iii) Syntactic Grammars* 

Ue have chosen the third? we didress a little here to 
Justify our choice by discussind the relative merits of each 
of the above* 

% 

3*1*1 LANGUAGE THROUGH WINDOWS 

j 

It is similar to QBE + NL flavour* The system opens 
some windows to the user on the screen* When the user 
specifies the values of fields which he knows i- and indicates 
the fields which he needs » then the systems dives the 
output* An example is NLX» desidned by Texas Instruments* 

They are relatively efficient but the options available 
to the user are very ridorously constrained* Secondly)- the 
user should know the ordanisation of the DB* These do 
adainst our objectives of a habitable NL and accessibility 


to a novice user 
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3.1*2 SEMANTIC GRAMMAR APPROACH 

This is used by some prsctical systems like LAIiDERC23 
3nd PLANESC3II. Ue proceed with sn example to discuss what 
it mesns. 

Consider LADDER system for examp'le? it is designed for 
US Navy. A small subset of the semantic Grammar productions 
used by it are diven in Fid 3.1. 


QUERY > 

SHIP-PROPERTY > 

SHIP-PROP > 


QUERY SHIP-PROPERTY of SHIP 

m 

what is/ tell me 

the SHIP-PROP/ SHIP-PROP 

speed/lenSth/typB 


SHIP — > 

SHIP-NAME > 


the SHIP-NAME/ SHIP-NAME 

Kennedy/ Kitty hawk/... 


Fid 3.1 A subset of LADDER'S drammar 



The followlri^ Questions 


"whst is lenath of Kennedy" 

•what is speed of Kitty hawk* etc* 
can he very easily parsed by the rules of the ^rarainar of Fi^ 
3*1* Each draminar rule or production has a set of actions 
transf oririation rules( trans-rules) * which are applied 
whenever the Grammar rule is applied* These trans-rules 
produce the Formal-Query* The Fid 3*2 shows how the formal 
Query is produced from the sentence by usins both the 
parsinS rules and the trans-rules* The nodes of the tree in 
the Fid 3*2 indicate the structures into which the Query is 
parsed* Whenever a syntactic structure is recoSnised» a 
variable "U* is set to a data structure obtained from the 
syntactic structure usinS the transformation rules* In the 
fisurev when "kennedy" is recodnised as a SHIP-NAME y then V 
is set eQual to '(NAME EQ JOHN F KENNEDY)* This structure 
is derived from 

<i)* the syntactic structure encircled in the Fiss 3*2 

<ii)* the transformation rules put in words like 
"NAME"y“EQ* and thus complete the name of the ship* 
Similarly)- other structures are also recodnised* 
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The msin differences between this end en ordinery 
English drsmmer is thet the cstedories which this uses ere 
not formsl English syntactic cetedories butt- those designed 
seecielly for the the interface* They are based on the 
semantics of the DB* 

There are many problems with this approach* 

* 

# It is useful only when the lansuade to be accepted is 
a very small subset of NL* As it does not capture the 
syntactic regularity of the landuaSea* it is more difficult 
to desidn the drammar a larde subset of NL* Eventually 
their ad hoc character makes them unusable* 

* It is desidned keepind in view both the input 
landuade and the tardet prodram (or tardet landuade)* The 
drammar rules and the trans-rules are desidned in such a way 
that they directly map the structures of one into the other* 
However^ as the specif ications of the input landuade becomes 
lardery it becomes very difficult to conceive the process in 
a sindle step* Soy the desidn of the drammar becomes very 


difficult 



* Whenever the DB is changed > a new semantic jsrammar 


and alons with it the trans-rules have to be desidned* This 
means an entire redesign of the system* 

Jjc It does not capture the lanauaSe structure* They are 
based on matching certain predefined patterns* 

All these ^o assainst the maJor objectives of our 
system* 


3*1*3 THE SYNTACTIC GRAMMAR APPROACH 

In this approach there are two steps* The first step 
parses a siven input NL ouery accordins to certain rules of 
Srammar* The second step takes the parsed output and 
applies heuristics us ins DB structure and produces the 
formal auery* 

The Grammar used in the first step is made General 
enoush to handle a larSe class of input aueries* While it 
is designed with the knowledge that it will be used for 
parsins of DB Queries? and in that respect miaht differ from 
conventional arammars of NL? it is independent of any 


particular DB 
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The second step which takes the parse and produces the 
formal auerw is called Formal Query Generation* The 
heuristics used in this phase utilise the syntactic 
structure of the p-arsey the meanind of certain DB specific 
words and the relational structure of the DB to. produce the 
formal Query* 

ft 

Let us see with an example* 

Exl* consider the Grammar shown in the FiG 3*3 

S > INT-PRO VP NP 

IHT-PFiO > who/what/which 

yp > verb/ to-be gerund 

MP > deT noun/ noun/ PROP-rNOUN 

VERB > teach/ offer / Give/ take/**** 

TO-BE > is / was / are/ were/ ♦ * 

NOUN > teacher/ teachers/ ***♦ 


GERUND 


> teachinG/ of f erinG/* * * 



PROP-NOUN 


CNAME/ SNAME/ TNAME/.*** 


Fi^ 3*3 A syntsctic sirammsr 


The «uer« 

'who tsusEht systems programming' 
has the parse shown in Fis 3*4* 

We define the following rules(for the grammar of Fig 
3»3> to generate the formal ouery from the parse structure* 

(i) . Using INT-PRO and VP identify the unknown* The 
verb in the VP or Gerund in the VP conveys information 
regarding what the needed field is* 

(ii) * Using the NP decide the known or specified 
field* 

Since the word "taught" indicates that we want "TNAME" 
and NP indicates that we are given CNAME» a list is 
generated as shown below* The first element indicates that 
TNAME is needed and second indicates that CNAME is given* 
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( (? TNAME ) 

( = CNAME <s«stems Fro^rsmiriin^) ) ) 


Following 3re some more examples which satisfy 
Grammar and can be processed by the above rules 4 

"who is teachinsS kumar" 

ft 

"who are takind datastructures " 

"who takes computerorSanisations" 


This approach is followed because of the following 
reasons 4 

t A larSe subset of NL can be accommodated by this 4 

)K The syntactic regularity in NL is captured unlike 
other 3PPro3ches4 

JKThe Grammar is designed with the knowledge that it 
will be used for parsing DB «ueries4 However? it remains 
independent of any particular DB4 
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^ Trans-rules depend only on the arammar and hence are 
independent of any particular DB« 

Ue now see that the system has at least two main parts* 

The first part produces the parse of the sentence* The 
parsins technioue followed is ATN formalism which is 
discussed in chapter 2* As with any ‘other parser^ a lexicon 
is kept alond with but separate from parser* It contains 
the words usable in the Queries alond with their syntactic 
categories* The considerations in designing the grammari' 
the logical organisation of the lexicon are discussed in 
detail in chapter A* 

The second part applies heuristics to the parse 
structure and produces the formal Query* This pass is 
called Formal Query generation and is discussed in., the next 
section* A block diagram of the system consisting of these 
two parts is shown in Fig 3*5* 

3*2 FORMAL QUERY GENERATION 

To understand the basic steps in this prodess» consider 


the following Query 
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■who tsuaht systemsprosi* 

This conforms to the sErammsr described in Fis 3*3* The 
perse is obtained as below* 

< ( INT-PRO who) 

<VP (VERB taudht)) 

% 

(NP <PROP-NOUN ( CNAME SHSProd))) ) 


The first step in formal Query Generation is 
identifying the fields^ decidinG on what fields are needed 
and what fields are Given etc* To det such information^ 
many heuristics are used in this phase* These heuristics 
operate on the structure of the parse and also on the 
jjB-specific information of certain words used in the Query* 

In the present example we proceed as below* 

(1) * From the interroGative pronoun and Vf decide what 
Is the needed field* 

(2) * From the NP decide what is the Given field* 
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Sor in the shove exstriples- the verb "tsu^ht" snd the 
INT-PRO ■who* indicste thst the needed field is "TNO** This 
is obtsined from the DB-si^ecif ic niformstion ke^-t with the 
word ’tsusht* in the lexicon snd from the fsct thst 'who* is 
personsl INTPRO* 

Sof the needed field is TNO snd the jsiven field is 
CNAME* It is such s kind of snslusis thst is done in the 
Fhsse* 


The informstion redsrdins! whst fields sre needed » whst 
fields sre siiven is stored in s dsts structure cslled 
skeleton structure* The skeleton structure of our example 
is shown below* 

< < ? TNO) 

( = CNAME (SYSTEMS PROGRAMMI 


where the first structure indicstes thst TNO is needed 
snd second structure indicstes thst CNAME is diven* 

In the second step of this P3ss» the skeleton euera is 
converted into the formal euery* This involves identifying 
the relstions contsininS the siven fields^ snd issuing out 


expressions indies tins 


the 


necessary selects project )> snd 
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Join op-erotions to be msde* 

Let us assume the DB structure shown in Fia 3*6* 

The formal auera Generated is as shown below* We gave 
the formal auera in English like statements* The actual 
santax of the formal aueries is discussed chapter 6* 

1* Instantiate CNAME to "(sastems program) *♦ 

2* Select the tuples of the relation 'COURSE which have 
their CNAME field as instantiated alreada* 

3* Project their CNO field* 

A* Join these into the relation OFFER on CNO field* 

5* Project the TNO field of these tuples* 


Thus? we see that formal auera Generation has two 
distinct parts* The first part takes the parse of the auera 
and identifies various fields? decides what fields are 
needed and what fields are Given? and finalla constructs the 
skeleton structure* We call these skeleton structures as 
LOCAL-TABLES* The Generation of these LOACAL-TABLES 
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involves heuristics which operate on the syntactic structure 
of the parse and the DB-specific information of the words* 
This process is discussed in detail in chapter 5* 

The second step takes skeleton structures and produces 
the formal Query which indicates what relations are to be 
chosen y what fields are to be projected » selected^ and 
Joined* This step utilises the information of the 
relational structure)^ key fields? and linking information* 
All this re<auired information is mapped into a Sraph called 
semantic siraph* This Sraph is used to Generate the formal 
Query and so this process is called semantic draph driver 
which is discussed in detail in chapter 6* A block 
structure of the system showing 

■Formal Query sieneration" into two parts? namely local 
table Generation and semantic Graph is shown in FiG 3*7* 


3*2*1 ORGANISATION Of LEXICON 

We have seen above that the heuristics used in the 
local-table Generation use the syntactic as well as 
DB~specific information of the words of the Query* 
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For exsiriFley consider the Query 

■who tsudht cneme “ Cdste structures)* 

The needed field of the above Query is TNO+ To det 
susch information we use the fact that "tauSht" refers to 
TNO* The best place to keep all such DB-specific 
information is lexicon* 

Soy we introduced another slot for each word besides 
the usual slot <lex-info>* This extra slot contains the 
information resfardind the DB-specific information* Durind 
the parse tree constructiony the In the parse treey alond 
with each word is kept not only its lexical catedory but 
also the DB-specific information* local-table denerator an 
easy access of information* 

The ordanisation of the lexicon is discussed in detail 
in chapter 4* 


3*3 RECOGNITION OF FIEUD DESCRIPTIONS 

There are many alternative ways to describe fields in a 
NL Query* To facilitate the parser to recodnise such 
different descriptions of a field into one sjtruc.turey we 
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replace the field descriptions by a standard format called 
canonical form* 

Let us illustrate it with examples* 
who is teaching data structures 

who is teaching course of data structures 

who is teaching data structure course 


All the above oueries have the same structure except 
for the difference in the description of particular field 
namely course“n3me field*These must have the same parse* We 
identify such structures and replace them by a cannonical 
form* This pass is called Normalization and it is done 
before parsing* The cannocical form in the present case is 


"cname = <dats structures)' 

If this cannonical form replaces the description of the 
field in the Query? then the above Queries reduce to 



who teaches cname 


(data structures) 


f»roducins the same parse structure* The normslizer process 
needs information about various ways of describing a 
particular field* So> we collect together all the possible 
descriptions of each DB-field into a droup collectively 
called Filter Networks* A driver routine Normalizer is used 
to compare the ouery with the descriptions diven by the 
filter networks and if any field description is identifiedy 
it replaces the field description with the cannonical form* 
The description of Filter Networks and the Normalizer are 
discussed in detail in chapter 4* As we have already seen 
normalization should be carried out before parsind* The 
system structure includind the normalization is as shown in 
Fid 3*8* 

3*4 A COMPLETE EXAMPLE 

We conclude this section by divind an example which 
shows how various blocks in the Fid 3*8 effect the proces 
of producind the formal tauery from the Endlish Query* 
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"who t3Ui3ht 3 course in the department of computer 
science " 

!♦ NORHALIZATIONJ 

The underlined strind of words is recognised to be the 
field description of the field “dept"* By replacing it with 
the csnnonical form we set the following* 

“who tsusht 3 course in the DEPT = (computer science)" 

II* PAFiSINGI 

Besides recosnisins the linguistic structures involved 
in the above eueryy the parser also keeps some DB-specific 
information of certain words* The parse looks as below 

CHAIN-CLAUSE (INT-PRO (PERSON) WHO) 

( VERB-PHRASE ((SUBJECT TNO) (OBJECT SNO)) 

TAUGHT) 

( NP (BET A ) (NOUN (CNO) COURSE) , ) 

(QUALIFIER (NO-INFO) TO ) 

( NP (NOUN (DEPT) DEPT ) ) 

(COMPARE (EQ (NO-INFO) = ) - 

(PROP-NOUN (DB-FIELD) (computer science))) 
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The ^remmsr which produces this parse is not discussed 
here but it is enough if we assume that the reeuired Grammar 
is there in the parser* A detailed discussion of the 
srammar is in chapter 4 

III* LOCAL-TABLE GENERATION 

This pass has all the rules which indicate how uo use 
the linguistic and the DB-specific info to set the fields* 
The rules depend upon the srammar* 

This priduces the output as shown below* 

< LOCAL-TABLE (NEED TNG) 

(OARS 


( TNO 


( CNO 




(ss DEPT (computer 
science) > 



This table Sives the iriformation that (i) we need "tno" 
<ii) the instantiated field is "dept* (iii) that “dept“ 
should be used to decide "cno" which inturn should be used 
to determine the "tno“J this is the heirarchy amons the 

ft 

various fields* 


IV* SEMANTIC GRAPH DRIVER 

This Sives the final form of the ouers J it 

supplements the intermediate fields ? if necessry* 

Assume the relational structure» the keys and the 
linking info as shown in Fis 3*6* The formal oueryj- shown 
here» consists of EnSlish~like statements* The actual 
formal ouery syntax is discussed in chapter 6* 

Formal Query 

1* Instantiate *dept“ to "(computer science) "♦ 

2* Select from the relation COURSE those tuples whose 
"dept* field is already instantiated* 
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3* Project the "cno" field of the above tuples ♦ 

A* Join these into the relation “OFFER" on "cno"* 

% 

5* Project the “tno" fields of these tuples* 



3*5 LIMITATIONS OF THE NL ACCEPTED 


Normal English Queries are accepted subject to the 
following conditions* 

* the Isndua^e should include those English Queries 
with minor errors in concord and tense* 

* the Query should be structured so that fields can be 
retrieved unambisiuously * 

The restrictions stipulated by the above criteria on 
the lansEuaSe are divert below* 

3*5*1 Error In Referencing 


Inspite 

of 

repeated 

suddestions 

of 

orthodox 

drammarisns f 

the 

usade of 

lose structure 

and 

improper 


deref erenicnd continues to be a part of our life* 
poor structure* ♦ who taudht a course in 83 2nd sem 
which offered by the dept of cs* 

Better structure* t Who taudht in 83 2nd sem a course 
which is offered by the dept of cs. 
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or 

Who taught a course which is offered by the de^t of cs 
and which is taught in 84 2nd sem* 

The system accents the second tsF-e of constrution but 
not the former* The rule to be followed to construct such a 
structure is 

■ The relative pronoun should be as close to the 
referent as possible and if there are more than one clause 
referencing the same referents they must be connected by 
conjunctions* ♦ 

3*5.2 QUALIFYING A PROPER NOUNJJ 

A proper-noun is considered to be a value of a field 
(of the db) by the system and so» Qualifying it by a clause 

is considered erroneous* 

' * 

Consider the example 

Not accepted *< referencing a proper noun) 
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"who taught kumar who is 3 student of the de^t of 
computer sciences* 

Acceptable wauii 

"who taudht the student who is havind teh name as kumar 
and who is in in the dept of computer' science" 

To avoid the structures of the former typei» followind 
rule can be used* 

'whenever there is a clause specifyind more about a 
proper noun (or precisely whose referent is a proper noun))' 
then replace the proper noun by the appropriate noun and 
introduce a new clause instantiatind that noun to the proper 
noun and connect the new clause to the specifyind clause by 
a conjunction** 


3*5*3 NOISE WORDS 

No noise words are allowed in the system landuade* The 
exmaples are "tell me")' "please")' etc* 

All the eijeries accepted by our system are "wh* type* 
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Herey the "value also specified " in the wes/no auer« 
is cno " cs415* So replace it by "a course" and introduce a 
new clause instantiating "a course" to "cs415"» Finally 
replace yes/no auestion by wh Question* Soy it becomes as 
follows* 

"what is the course offered by the dept of cs and which 
is cs415" 


3*5*5 ACCEPTABLE ERRORS 

The system accepts some small errors in concord and 
articles* Examples are as below* 

"who are the teacher of cs412" 

(error in concord) 

"who are a teachers of cs512* 

(error in article) 

These aspects provide the boundaries td the 
accepted by our system* A detailed dis,wu»s»ion of the 
Grammar accepted by the system is given in chapter 4* 



CHAPTER 4 


NORMALIZATION AND PARSING 

4.1 INTRODUCTION 

Although normslizstion and parsiruSy as introduced in 
chapter 3y are two different passes y both are combined into 
a single chapter because of the similarities in the 
formalisms used for both of them. Each is discussed in 
detail in the sections that follow. 

4.2 NORMALIZATION 

The design considerations that led to the decision to 
normalize a Query before parsinsi it are discussed in detail 
in section 3.3 A brief recapitulation is Siven here. 

consider the Queries 

"who tausht the course of data structures in the dept 
of cs" 


"who taught data structures in the dept of computer 


science 



Both the above eueries have the same structure except a 
change in the descriptions of the fields "cname" and *dept** 
In order to facilitate the parser to recognise such 
variations in the field descriptions and to map them to the 
same structureir we decide to replace the strings of 
descriptions by a uniaue representation scheme as follows 

"(fname relation-operator value)" 
where "fname* is the field name relational-operator is 
etc» and value is the value that is provided ba the 
description* 

In the above example? the field description of "course" 
may be represented by usinS schema as follows* 

"cname - (data structures)" 

Similarly? the strind describing the field "dept" can 
be replaced ba "dept" == (computer science)" 

Replacing the field description with the cannonical 
form? we det the following ouera* 

"who taudht the cname = (data structures) in the dept = 


(computer science)* 
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Such 3 unified representation scheitie which replaces the 
English description is whst we csll 'csnnonicsl form'* In 
the example above*- "cname = (data structures)* is the 
cannonical representation* The euers so developed after 
replacing ail the descriptions is called "normalised euerw" 
and the process is called "normalization"* 

The objectives of the normalization process are 

5jc to accept the field descripions specified by Grammar* 
This allows arbitrary descriptions to be defined by. the 
user ♦ 


5K to Generate cannonical representation of each field 
description* 

There are two main parts in the normalization process* 

(i) * A data driven prosfram called normalizer 

(ii) * Data structures containing the field 
descriptions and the actions which are taken on successful 
field recognition? these actions Generate the cannonical 
representation* These datastructures are called filters end 
the actions and the filters together are called filter 


networks* 



4-4 


The data driven noritKalizer takes the input euera? scans 
word ba word and compares with the field descriptions ♦ When 
a field description is found » it performs the actions 
corresponding to the recognised field ♦ These actions 
replace the description with the cannonical form* After the 
Guera is completela scanned? the normalised auera is 
returned* 

4*2*1 FILTER NEWORKS AND NORMALIZER 

The desiSn of filter networks should meet the objective 
that thea should accept a lansuase as close to English as 
possible and thea should be easila adaptable to a new DB5 
to meet both of them? we decided to use ATMs to be the 
filter networks i*e* a set of ATMs is defined to contain 
the grammar of the field descriptions* Normalizer? then? is 
an ATN interpreter* The ATN formalism we follow here is 
described in chapter 2 * 

All the actions of the ATNs are contained in a table 
called action-table* The action table is an assoc list with 
the following organisation* 

(actions 

<3ctioris> ) 


(<ATN“n3me“l> 
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(<ATN-n3m©“2> <sctioris>) 

> 

On successful comF-letion of sn ATN (as indicated by DONE arc 
in our formal ism) » the actions corresponding to that ATN are 
performed by the normal iser. 


The ATNs used in this pass for describing th fields are 
^iven in appendiK 1* The first two ATNs named SI and E do 
not describe any fields but provide control on the 
normalisation process* We consider them in detail below* 
Throughout this discussionj JJcs* means the ouery yet to be 
processedf JlcvalueJfc means the normalised ouery produced so 
fari- and iKw* means the current word* 

The ATN SI has two arcs? the first arc is a 
termination condition second is a self-loop* If the first 
succeeds^ which means *s* is null and the input Query is 
completely scanned^ the norcnaliser ^oes to Sl-1 and 
thereafter returns lvalue** Otherwisei’ the second arc 
passes control to E which replaces first few words by their 
cannonicel representation if they describe a field» 
otherwiseif it treats the current word as the startinsE of no 
description and puts Jthe word as it is in *value** In 
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either case? control returns to the node SI because of the 
self loop and then the above set of actions repeat* 

The ATN correspondins to E checks if the current word 
Hcw* is the startind word of a field description* 

Each arc of the ATN E is a call on another ATN 
describing a field* If any one of the calls is successful )» 
i*e* if there is a field description starting with ^w*? 
then the corresponding ATN returns the cannonical form of 
the field recognised* The ATN E on completion replaces the 
description with this cannonical form* 

On the other hand if none of the arcs succeed » then the 
last arc “skip* is taken and it removes the current word 
from >Ks>lc and keeps it in the sKvalueJ^ thus treating it as 
startinsi of no description* 

We have designed about 20 ATNs for 5 different fields* 
We shall show the considerations that went into the design 
of these ATNs by illustratinsE the desisn of the ATNs for one 
filter network* 

Consider the following descriptions of department 
phrase* 


department of computer science 


dept of cs 
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"dept = cs* "computer science dept" 

Each of the above descriptions has two parts* 

(i)* The description which dives department name 
indicated ba strinds like "cs"? "computer science"* 

<ii)* The strind of words havirid the tokens like "dept 
of"» "department of*y "dept = cs"f etc 

We desidn two ATNs to accept both the parts* 

The ATN shown in Fid 4*1 will accept the name of a 
department consistind of two words* The first word is ana 
one of the names "computer"c "electrical " » "chemical"^ etc* 
The second word is ana one of the "science" » "endineerind* » 
etc* Arc 1 accepts the first word if it is ana of the words 
in the list bound to ^departments* whereas the arc 2 accepts 

if the second word is ana of "science" i- "endineerind" » etc* 

# 

To handle abbreviations we add a third arc as shown in 
Fid 4 * 2 * 

At the end of evera ATNy a variable Y is bound to 
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"<<ri3iiie of the ATN> “syntactic structures that led to 
the success*)* 

So» we have to desisin the actions for this ATN in such a way 
that they take this Y and produce the formal output* By 
convention the output of an ATN kept as 

*<<ATN~n 3 me> (field value))* 

If 'computer science" is recosnisedy then Y will be 
bound to (dp*x*x computer science)* we want the pattern 
(dp*x*x (computer science) ) to be returned as the result of 
the actions* To achieve such a chande? the following 
program is written and kept in the action-table 
corresponding to this ATN* 

(list ‘'dpJxJx (cdr y)) 

To desisn the second ATN to take care of part (ii) of 
department description we have the following specification* 

(i)* "dept" or "department" optionally followed by 
"of" or and then necessarily followed by department name 


specif ication 
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(ii)* Department name optionally followed by 
■department* or “dept"* 

The ATN to recognise this is shown in Fid 4*3 

The actions to be done on the successful recodnition of 
this ATN should be such that they produce the cannonical 
form* At the end of the ATN? Y is bound to a list 
containind all the syntactind stuctures that led to the 
success of the ATN* For example? if the structure 
recognised is "dept of computer science"? then Y will be 
bound to <dept-phr3se dept of (dpJxtx (computer science) ) ) 
But? we want a structure 

(dept “ <dept-n3me>) 

So? to achieve this? we write a prodram as follows* 

(list 'dept (cadr (assoc 'dptx*x y))) 

This prodram is kept in the action-table correspondind to 
this ATN* I 

Thus? we have seen how to desidn the ATNs for the 
■dept* field* The remainind ATNs can be desidned similarly* 
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We conclude this section with en exsitiple* 

•who tsusht cs450* 

The first ere on the ATN SI fsils ss is not null. 
The next arc calls E* In the ATN Ef the first five arcs 
fail because the starting word “who" does not satisfy any 
one of the ATNs describins the fields* So» the last arc 
•sKie* takes “who" from *3* and keeps it in lvalue*. The 
same happens with the word "tausht"* 

The next word •cs450" fails on "dept-phrase" arc and 
•name-phrase* of E* But in “course-phrase" it succeeds as 
follows* In the ATN corresponding to “course-phrase" » the 
first arc ■< parse cnoJa)" is taken and control passes to the 
ATN cnoJa* 

/ 

There is only one arc in the ATN cnota and this arc 
succeeds because all the three conditions on the arc are 
satisfied* This produces the structure “(cno*a cs450)“* 
The control returns back to the ATN “course-phrase"* 

On return to the ATN "course-phrase" n it succeeds and 
forms the structure “<cno « (cs450))* and control returns to 
the ATN E* In E the control does to the node El end from 
there it does to the top level ATN £51* The structure 
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returned ba the ATN "course phrase" is substituted in the 
sentence* Ba now :ics# has become null and the first arc on 
SI succeeds and so control does to Sl-l where it returns 
^value^* So we Set the structure 

"who tauElht cno == (cs450)" 
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4,3 PARSING 

As disucssed in section 3t2*ly we decided to use 
syntactic srammar approach for our system* In this approach 
there are two staSes. In this first >st 3 Se the input Query 
is taken and a parse is producedf while in the second staSe 
the parse is the input and a formal Guerw is the output* 

I4e use ATN formalism for parsing (staSe D* The ATN 
formalism is discussed in detail in chapter 2* There are 
two basic parts in it* 

# 

(i)* The lexicon and 

<ii)* The stramroar and Parser* 

4*3.1 LEXICON 

The constituents of the lexicon are the words used in 

i 

the Queries* The lexicon is losically devided into two 
parts? the core lexicon and the DB~specific lexicon. The 
core lexicon contains words which are DB-independent like 
*is*? "are*? "who* etc* The DB-specific lexicon contains 
words which ere specific to the DB under consideration like 
"teach*? "offer"? etc* 
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As discussed in section 3«2*lii we keep in the lexicon 
the db~specific info rinst ion of e@ch word alonS with its 
lexical c3teSoru+ This is done to ease the process of 
locel-table Seneration* 

The logical structure of each entry of the lexicon is 
as follows* 

(word (lex-info) (DB-specific info) ) 

The lex-info dives the lexical catedory to which the 
Particular word belonds* The followind lexical catedories 
are recodnised* 

(i)* noun (ii)« verb <iii)* det (also known as article) 
(iv)* derund (v)* east-part (past participle) 

(vi) * prep (preposition) 

(vii) . int-pro (interodative pronoun) (viii)* to-be 
(ix)» prop-noun (proper noun) (x)* an-finite 

(xi)* ac-prep <xii)» ps-prep (xiii)* ea 

Almost all of them are well known catedories of Eindlish 
draromar* However^ the followind need a special roention+ 
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<i)« 3c~prep ♦ These are the prepositions which are 
used in active voice form of the verb as in 

"who is teachinsf in the dept of cs" 

"what was taudht to kumar" 


<ii)* ps“prepj It is exclusively the "by" used in the 
passive voice* 

<iii)* eo J it is any one of the three j. •<" ir ">" ♦ 

(iv)« prep* any prePositioriK both ac-prep and 
P8~prep* 

A word can belonsS to more than one category* We have 
to ssive the list of categories to which the word belongs* 
An example is shown below* 

< (to in of) (ac-prep prep) (no) ) 

It shows that the words? to? in? and "of" belonsS to two 
categories "ac-prep" and "Ps-prep"* "(no)" describes that 
there is no db-speciflc information associated with these 


words 
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4 *3*2 DB-SPECIFIC INFO 

The nouns snd verbs used in 3 psrticulsr DB msu have 
special meanins* So? the fields which the nouns snd verbs 
refer to are represented in a special slot called DB-info* 

For nouns» the information is indicated by a single 
slot taSSed "person* or ■thins" ♦ Whether it is a person or 
a thins is decided by the int-pro used to refer to the nourit 

The field referred to by the noun "course* is “cno** 
Sor we keep the db-specific information of "course” as 
follows* 

(cno (noun) (thins cno) ) 

For verfosf the losical orsanisation is as follows* 

(< wo rd> (verb) 


(doer 

(person ♦ ♦ * ) 

( thinsf ♦ ♦ . ) 

) 

(done 

(person ♦♦♦) 

(thins ♦♦♦) 

) 


) 

doerJ Those nouns which ,c3ri act as subjects of the 
verb in active voice or objects of the verb In Passive voice 
are represanted in tbis slot.* , 
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For exanipleif "A teacher offers* or "♦.* is offered 
by 3 teacher* 

"A de}»t offeres ♦♦♦“ or "*♦♦ is offered by a dept" 

Sof the doer slot of "offer" is ’ 

(doer (person tno) (thinjs dept) ) 


Done* Those nouns which can act as the objects of the 
verb in active voice or subjects of the verb in passive 
voice are put in this slot* 

For the same verb "offer* r the done slot is as folios* 

(done (person sno) (thins cno) ) 

Soi- the complete representation of the verb "offer" is 
(offer 

(verb) . : 

1 

(doer (person tno) (thins dept ) ) 

(done (person sno) (thind cno >; ) 


) 



4.3*3 PARSER AND GRAMMAR 


The trammer is described b« s set of ATMs end the 
driver routine is the ATM interpreter* The formslism is 
discussed in chapter 2* 

The ATMs used in this phase are “shown in appendix 2* 
We discuss the desisn of the ATMs in detail below* 

The oueries accepted by our Srammar have a main 
clauseChence forth MC) followed by an arbitrary number 
of subordinate clauses (hence forth SC? and/or compound 
clauses <COMP-CL)* The difference between subordinate 
clause and the compound clause is in theit referents* The 
referent of subordinate clause is the last noun of the 
immediately precedins clause while the referent of a 
compound clause is the noun referred by any one of the 
previous clauses* The following Query illustrates all the 
three types of the clauses. 

■who tausht a course which is offered in the dept of ee 

and who is in the cs dept 
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The Risin clause is “who tsuaht s course" » "the 
subordinate clause is "which is offered in the deet of ee" 
and the comeourid clause is "and who is in the cs dept"* 

The structure of the Query is MC followed bw SC 
followed by COMP-CL* The SC refers to the last noun of the 
preceding clause (here "course") whilei- the COMP-CL refers 
to “teacher* which is refered by the MC* This major 
structure explains the top-level ATN S shown in Fid 4*4* 
The arcs from 1 to 3 and from 1 to 4 are to check whether 
the next clause is a SC or a COMP-CL* If the next clause is 
a scf the first word of it is int-pro whereas if it is a 
COMP-CL int-pro should be in one of first two words* The 
arc from 1 to 2 is to check if the ouery is completely 
parsed* 

Let us see the design considerations of MC* The ATN 
for MC is shown in Fie 4*4 

Since our grammar accepts only interrogative oueriesy 
the first word of any ouery is an interrogative 
pronoun (henceforth int-pro)* So* the first arc of MC 
accepts an int-pro as shown in Fig 4*4* 



t^AIN CLAUSE ( p_ at N- 2 .]) 



f\?|- . 4’'4' the ATNS op “s" oimd MAinclAUse 


The next word of an interrogative ouera is a verb* The 
verb can be a simple to-be tape of verb or it can be a 
passive or active voice of verbs like "teach" > "take" etc* 
All of these combinations are recognised ba the ATN 
verb-phrase* So? the next arc recognises this verb-phrase* 

The next santactic structure is the object of the verb 
in the previous santactic structure* It can be a noun or an 
article followed ba a noun* The ATN NP recognises these 
structures* So the next arc is NP. 

These three arcs constitute the first part of MC* 
Alsor these three arcs indicate what noun is referred to ba 
the MC* We shall illustrate with examples* 

"who teaches a course **" 

"who is the teacher ♦***" 

Thus? the verb phrase can implicitela contain the 
needed noun as in example 1 above or the needed noun can be 
explicitla Siven as in example 2 above ("teacher")*. 

The desisn of the remaining part of MC is as follows* 
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In an interro^atrive Queryj we first indicate what we 
want and then provide other fields froiri which this neede 
field has to be found* The following examples illustrate 
this* 


"who is the teacher of cname = (data structures) " 

"who is the teacher of the student of the dept offering 
cno * ee424* 

In the first example the needed field is "teacher" and 
the known field "cname * is Siven as an instantiated value* 
The connective between the noun "teacher" and the noun 
"cname" is the prep "of"* 

In the second examplen the known fields are not 
instantiated? hut are aiiven as a chain of connectives 
followed bw nouns* 

Thus there are two basic ways in which the known fields 
can be Siven* The first one consists of series of 
connectives followed by noun phrases (as in "teacher of the 
student of the dept offering cno**") whereas the second one 
consists of instantiations* Two parts are designed in the 
ATN for MC to meet both these* The loop structure of the 
ATN in fisi 4*4 shows these* 
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The connectives used in the first part can be a simple 
preposition as in "teacher of student* *“ or it can be a 
Gerund as in *dept off erins* ♦ ♦ " So» to meeet all these 
possibilities of a connective (or a eualifier) an ATN 
Qualifier is designed* This recognises all the possible 
types of 3 Qualifier* So the next arc between nodes 3 and 4 
is this Qualifier* The remaining arcs between the nodes 3 
and 4 are to meet other possibilities of a Qualifier* After 
this Qualifier the next word is a noun* Soi* an arc to 
accept noun phrases is put between nodes 4 and 5* After 
this there are two branches corresponding to the two parts 
discussed earlier* 

The first part namely the series of Qualifier followed 
by noun phrases can be recognised by providing an arc back 
to node 3 from node 5 as shown* Thus structures shown in 
example 2 above are recognised by this loop "nodes 3 to 4/ 4 
to 5r 5 to 3"* 

The second partr namely the instantiation psrti' 
contains a noun followed by a relation operator followed by 
a proper noun* These two? the relation operator and proper 
noun are recognised by the two ATNs compare and np 
resj^ectively* Soy the next two arcs in the second branch 
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are from node S to 6 alona compare and 6 to 7 alond np and 
from 7 back to 3 alond a Jump or conJ* 

This finishes the desisn of me* We shall illustrate 
with an example to show how various structures are 
recognised by MC* 

•who taudht a course in year = (84) in the dept = 

(computer science)* 

The top-level ATN S is taken and the only one arc 
■ (parse MO* transfers control to the ATN MC*' The various 
words in the above Query parsed by MC are as follows 

■who* ♦ The first arc recognises this as a int-pro and 
the following structure is constructed t ' 

(S (MC ( INT-PRO (PERSON) WHO))) 

1 

i 

I 

! 

■tauaht" t The second arc '(parse verb~ph)“ recognises 
this as follows* j 

The ATN verb-ph is shown in Fis 4*5* Th^e first arc 
•parse active— voice* takes control to the ATN active— voice 
shown in Fis 4*5* Since "taushf is not of the type to-be f 
the first arc fails. The next arc checks for 'verb* Since y 
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•tauaht* is a verbt it succeeds* Soy this returns 

•(ACTIVE-VOICE (VERB (DB-INFO) TAUGHT))" to the csllinS 
ATN neonelu * verb-F-hrese* ♦ This ATN checks if enu pre> is 
followinsi it. Since the next word is not prepy it takes the 
Jump arc succeeds^ returning 

•(VERB-PHRASE (ACTIVE-VOICE (VERB (DB-INFO) TAUGHT)))* 
to the callind ATN> namely the HC. The structure so far 
constructed is 
•<MC (INT-PRO (PERSON) UHO ) 

(VERB-PHRASE (ACTIVE-VOICE (VERB (DB-INFO) TAUGHT) ) ) ) 

■a course*. The next arc is "parse NP*. The ATN NP 
recoanises the two words *a“ and “course" as a noun phrase. 

Sof the structure formed is 
•(HC (INT-PRO (PERSON) WHO ) 

(VERB-PHRASE (ACTIVE-VOICE (VERB (DB-INFO) TAUGHT) 

(NP (DET (NO) A) (NOUN (CNO) COURSE) ) 

) 
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The control comes back to node 3 via the Jump arc 
between node 5 and node 3* 

*in" ♦ The Qualifier arc consumes this* 

•year = <84)" ♦ These thre‘e words are consumed 

successively by the three ATNs NP» COMPARE and NP by the 
three arcs from node 4 to 5? node 5 to 6f node 6 to 7* 

The control comes back to the node 3 from the node 7 via the 
Jump arc. 

The remaining structures of the Query are recognised 
similarly!* returning the parse as follows* 

<S 

(MC <INT”PRO (PERSON) WHO) 

(MERB-PHRASE (ACTIVE-VOICE (VERB (DB-INFO) 

TAUOHT))) 

(NP (BET (BB~INF0) A> (NOUN (CNO) COURSE) ) 

(QUALIFIER (NO-INF (DB-INFO) IN) 

(NP (NOUN (YR) YEAR) ) 

(COMPARE (EQ (DB-INFO) =) 

(PROP-NOUN (DB-INFO) (84) ) 

(QUALIFIER (NO-INF (DB-INFO) IN ) ! 

(NP (NOUN (DEPT) DEPT) ) 

(COMPARE (DB-INFO) * ) 





;»kCTIVE-VO>CC» 
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(PROP-NOUN (DB-INFO) (COMPUTER SCIENCE) ) 


) 

) 

We shell see the desisn consideretions of some of the 
remeinina ATNs* 


4»3.4 DESI(3N OF NP 

The ATN is shown in the Fia 4.5* A noun phrase can 
optional lu have a determiner followed by a noun or a proper 
noun* The desian of NP accounts for these cases* 


4*3*5 DESIGN OF PROPER NOUN 

The approach we followed is much more efficient when 
compared to the approach in PLANES and in LADDER* 

In the LADDER for example » the names of the ships are a 
Part of production rules and soy they are kept in the 
lexicon* We do not load the lexicon with the proper nouns* 
Insteady we use the following approach. 
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The Proper nouns ere devided irito two cetedories* 

<i)« Those which can be recognised st the 

normalisation process* 

tii)* Those which can be recosnised at parsind* 

The first type of proper nouns are like "dept of cs*i' 

li 

'data structures course" » etc* Such proper nouns can be 
recognised at the normalizer stade* 

The second type of proper nouns occur in eMeries like ♦ 
"who taudht kumar'i' "what was taudht by DRsandal"!' etc* 
Such proper nouns can be recodnised only by the context of 
the verb in which they are present On recodnisind such 
proper nouns the system confirms it by askind the user about 
it and then proceeds further* Whether such proper noun 
represents a teacher name or a student name is decided from 
the verbf the voice end whether it is a subject or a object* 

Once a proper noun is defined^ then if the same proper 
noun is used by the user in the next aueryi- then the system 
proceeds without askind adain* , This is achieved by keepind 
the user defined proper nouns temporarily in the lexicon* 
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4* 3,6 MERB-PHRASE 

The verb-phrese ATN is shown in Fis 4*5« A verb phrsse 
C3n be 3 psssi ve~voice followed by a preposition or a 
active-voice followed by a preposition* This accounts for 
the design of the ATN 'verb-ph"* 

The ATN corresponding to active-voice is shown in Fi^ 
4*5* The active voice can be a simple "to-be" or a "to-be" 
followed by a "aerund" or a simple "verb"* These are shown 
in the ATN "active-voice" of Fia 4*5* The remainina arcs 
are used for the cases of perfect tense and perfect 
continuous* Includina these arcsy we aet the complete ATN 
"active-voice"* 

The desian of the ATN "passive-voice" is similar* 

The remainina ATNs are desianed similarly* 



CHAPTER 


LOCAL-TABLE GENERATION 


5,1 NEED FOR LOCAL-TABLE GENERATION ■ 

To Senerste a formal Quera» we musL identify what 
fields are needed and what fields are reeuired, But» a NL 
ouer«» in fienerali- does not dive these fields explicitla. 
Insteadr they are embedded in the riounsif verbsir and other 
syntactic structures. Local-table senerator extracts these 
fields from the parse of the Query and stores them in a data 
structure called "local-table*. We illustrate it with an 
example, 

"who teaches the student in the dept ® (computer 
science)* 

The word 'student* implicitely cdnveys the field "sno*, 

# 

Similarly!* the verb "teaches* conveys that the 'tno* field 
is needed, Alsor the Query shows that "student* is to be 
found usina the instantiated field 'depf? the "teacher" is 
to be found usina the "student" thus obtained. Thus" there 
is a he i ra rch¥ albhii which the field has to be found. We 
call this "heirarchical ‘tree". The subject of the Queryr 
^ ^ red to by " Wh^d " ■ i s " tnof ^ ■ this is, thd fmal result 



The 


from the Query* We call this field "Needed field** 
field *sno*» which is. used to specify the path to find the 
needed field from the instantiated field isn called 
"specified field* ♦ 

Soi' the entire information of the euery can be 
illustrated as in the Fis 5*1 

Local-table Generator extracts such information from 
the Query and stores it in the local --table* 


5*2 DEFINITIONS OF TERMS USED 

We define certain terms that are used in further 
discussion* We use the example shown in Fis 5*2 as 

reference* 

"who is the teacher of the student in the dept = 

< computer science) taKins a course which is offered by the 
Tname = DrRsanalal** 

Fis£*5*2* An example Query* 

The Query shown above has two clauses and the parser 

produces a parse for each of them separately* The first 
clause is MC and the second is SC* The following terms are 
defined for each 
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5*2* 1 NEEDED FIELD 


Each clause has a subject 

which 

is referred to 

by 

its 

interrogative pronoun* The 

field 

corresponding 

to 

the 


subject is called the 'needed field'* 

In the first clause^ "tno" <corresPondind to the word 
'teacher') is referred to bu the interrogative nrorioun 
•who'. So» 'tno* is needed field of MC. The second clause 
refers to *cno' (corresponding to the word "course") and so» 
■cno" is its Needed field. 

5.2*2 INSTANTIATED FIELD 

A clause may instantiate some fields to some values. 
These fields are called instantiated fields. 

In the first clause the field 'dept' is instantiated to 
* (computer science)* and so^ *dept" is called instantiated 
field of the HC. Similarly "tname" is instantiated field of 
SC. 

5.2.3 EXTERNAL FIELD 

In a clause^ a field may only be mentioned? but the 
properties it satisfies are Siven in a later clause. Such a 
field is called ©xiefnal field 'as its description is 



externsl to the clause in which it is eresent 


In the example o'f Fis 5*2 "cno* (the word “course") is 
mentioned in MC but the way of f indind is Siven in SC* So» 
it is called external field in MC* 


5.2.4 PATH FIELD 

Each clause may use some fields to specify the path of 
finding! the needed field from the instantiated fields* 
These fields» used for specifying the P3th» are called path 
fields* 

In the above examplef MC uses "student* as an 
intermediate field to set the needed field “tno" from both 
the external field “cno* and the instantiated field "dept"* 

i 

Actuallyi- it specifies that "sno" has to be fpund from the 
"dept" and "cno* (which is described bw a latter clause) and 
the "sno" so obtained must be used to set "tno*. So^ "sno" 
is used for specifying the path and hencer it ife called path 

t 

field (of MC) * ; 

j 

t 

For SC there is no path field* The needed: field has to 
be obtained directly from the instantiated field* 



5.2.5 HEIRARCHY 


Th 0 r^ is B h©ir3rchy on various fields in a clause* 
The heirarchy indicates how each field is to be found front 
the others* 

In the eKaiitFle above? MC indicates that 

(i) . "sno* has to be found from “dept" and "cno“. 

(ii) . *tno“ has to be found from the "sno* determined 
above. 

Thus there is a heirarcha in the fields? this 
heirarcha is mapped into a tree data structure and stored in 
the local-table. The waa of mappins is as follows. 

“If fl is to he determine usins the fields f2».,f3 them- 
the fields f2»f3 must be put as sons of fl.“ 

So» the needed field is put as the root of the tree 
(henceforth “heirarchical tree"). Those fields which must 
be used to determine it are made as its sons. Each of the 
son h3s» as its sonsji the fields which determine it and so 
on. This process is repeated* The leaf nodes are either 
Instantiated nodes or the external nodes. ■ 

For our example HCr the needed field is "tho" and sor 
it is put as reot* . the field 'sno* should bemused to find 
•tno*> s© it is eut as son ©f the v j “tno** The 



*tno" has to be found from the instantiated field 'deet* and 
the external field *cno*» Soy they are. kept as sons of this 
node* The final tree is as shown in Fis S»3* 


5»3 LOCAL-TABLE STRUCTURE 

A local-table is a data structure in which the 
information extracted from a clause is stored* A euery has 
as many local-tables as its clauses* 

A local-table has four slots* 

5*3*1 TYPE SLOT 

The first slot aives the type of the clause for . which 
the local-table is Generated* Ue know from the previous 
chapter that there are three types of clausesr MC» and 

COHP-CL* Sof this slot contains information reaerdind this* 

5*3*2 NEED SLOT 

This slot contains the needed field of the clause* So» 
it is celled the need slot* 
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5*3*3 MARS SLOT 


This slot contains the heirarchical tree of the clause 4 


5.344 EXT SLOT 

This slot contains the external field of the clause)- if 

anw. 


The local-tables of the example of Fis 5.2 aije shown in 
Fia 5.4. 


5.4 FIELD EXTRACTION 

Our aim is to desian an alaorithm which takes the parse 
of a Query and returns a local-table. 

There are basically two ATNs which accept the Queries. 
One is the ATN correspondina to ML and the other SC. We 
explain the desian of the alaorithm for extractina fields 
from the MC and the process of extractina the fields from SC 
is similar. 

The ATN correspondina to MC is shown in Fia 5.5. 
Semantical lyy it can be devided into two parts. The first 
part encapsulates the different ways of expressina the 
needed field of a The second part indicates the 
different heirarchlc^'l structures '^/.'P^ossible emona; various 
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fields* The first pert consists of ercs of nodes t 0 to 1» 

1 to 2$ 2 to 3 of the Fis 5*5* The second pert consists of 
the reroeininS loop like structure* 

The first pert is designed to eccept enw needed field? 
either explicitely es e noun or implicitely in e verb? 
either in pessive voice or ective voi'ce* So? the desisn of 
en slsorithm to extrect needed field is^besed on these ercs. 
It is discussed in section 5*4*1* 

The second pert is used to describe the heirerchy of 
one field over the other* So? this pert is used to desissn 
en elsforithm for extrectins the hei rerchicel-tree* This is 
discussed in section 5*4*2* 

5*4.1 EXTRACTION OF NEED FIELD? 

The needed field of eny cleuse cen be extracted from 
the first three syntactic structures corrcspondinss to the 
first three ercs of the ATN Fid 5*5* 

There ere various uess in which e needed field cen be 
there in a clause* They are shown below* 

<1). "who is th^ t©»cher of cneme (data structures)* 



HAJN CLAUSE (;p-atN-2.^ 



( 2 ) 


"who 'b^u^h't crisme = (dsts structures) 


<3)t "who was tauaht in the dei^-t = (computer science)* 
(4)» "who is teaching cname = (data structures)* 


The parses of these Queries are ‘shown in the Fid 5*6 

In the first casef (example 1 above) the verb in the 
verb-phrase has onlw "to-be* type verb which carries no 
DB-SPecific information. The needed field is in the noun of 
the next noun-Phrase (indicated as NP in the parse). Hence 
for such types of oueries? which have * to-be* type of verbs 
as the main verbs? we can extract the needed field from the 
noun of the NP following the verb-phrase. 

For the example the needed field has to be obtained 
from the noun "teacher** The field referred to by this noun 
is in the parse beside the noun itself. This is contained 
by the <db-specific info> slot of the noun. In this example 
the field referred to by "teacher" is “tno". 

Thus? when the main verb is of the type ^ to-be" then, 
the needed field is to be taken from the noun phrase 
represented bw the third arc of Fis 5.5. The needed field 
referred by the noun of th# noun Phrase is available in the 
parse -beside 
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The rest of the examples have the needed field 
implicitely mentioned in the verb or serund* To extract the 
needed field from the verbi- we have to know what the verb 
refers to* Sof we shall see what the <DB~specif ic--info> 
slot of a verb contains and then decide how to extract the 
field referred to by the verb* 

ft 

The <DB~SPecif ic-inf o> slot of any verb has two main 
slots* One is ■doer* slot and the second is "done" slot* 
The "doer* slot indicates the subjects of the verb while the 
■done* slot indicates the objects of the verb* 

The <DB-specific~info> slot of the verb 'offer* is as 
shown in the FiS 5*7* 

In the Fis 5*7» the "doer* of "offer" can be a teacher 
or a department* Sor the fields corresponding to the two 
nouns are put in the "doer" slot* The slot named "person* 
indicates that the field in that slot is of "person* type* 
Similarly the non-person type of fields are put in the slot 
"thins** In the above example? in the "doer" slot? the 
field ’tno" is of "person" type and so? we keep; this in the 
•person* slot? the field "dept* is of non-person type of 
slot? and so? we put it in the "thins" slot* The same rule 
is followed in the "done* slot also* 
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We shell see how to use this representetion to extract 
needed field* If the verb phrase is in active voices as 
indicated by the parser then the needed field is the subject 
of the verb* Sor the needed field is taken from the *doer" 
slot of the <db--specif ic-info>* Whether we have to take the 
field in the •person" slot or "thins" slot is decided 

Ir 

accordinsf as the "int-pro" is a "person" typedike "who") or 
a "thins* type (like "which" "what")* 

Similarlyr the "done" slot is taken when the 
"verb-phrase" is in passive voice* Sor the rule to extract 
the needed field from the verb is as follows* 

•Decide the major slot "doer" or "done" accordins as 
the "verb phrase" is in active voice or in passive voice 
(this is indicated bw the parse)* The lower slot., "person" 
or "thine" is decided accordins as the "int-pro" is a 
• jagi^^on" or "thins" type (this is available in the parse 
beside the word corresponding to "int-pro")*" 


Sor 

by usinsi this rule» the needed 

fields 

of 

the 

examples 

2r3i»4f5 are respect ivelyi- "tno'i- 

"sno")' * 

cno" 1 - 

and 

•tno" ? 

the <db-specif ic-info> of the verb 

"teach" 

is 

same 

as that 

of "offer* (shown in Fid 5*7)* 





After f indinss , the needed field of a clause the 
corresponding sloi in . the : clause is 

filled with this field* Also? the needed field constitutes 



5-12 


the root node of the heirerchicel tree of the clause.. 

The part of the ATN used for finding the needed field 
is the first three arcs* The rest of the ATN is used to 
decide the remaining part of the heirarohical tree. This is 
discussed in the following section. 


5.4.2 EXTRACTION OF HEIRARCHICAL TREE 

To build UP the heirarchical treei- we need to recoSnise 
the fields referred to by the noun phrases (indicated as NP 
in parse) and put them in proper structure. The part of ATN 
used for this purpose is shown in the Fis 5.8. This has 
basically two parts. Each part recognises specific sentence 
frasiffients with specific semantics. 

The first part recognises Qualifier phrases. Nodes 3 
to 4» 4 to 5» 5 to 3 constitute this part. A Qualifier 
Phrase consists of a Qualifier followed b« a NP. The 
following example shows it. 

■who is the teacher of the course in the dept of the 
student *♦ 

The parse of the ^bove €?l®wse is shown in the Fi^ 5.9* The 
parse shows that i ■ It ,':Of vsefuehbe o^': "QUALIFIER ■■ 
followed bw 
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The first three structures determine the needed field* 
The syntactic structures that follow are a series of 
Qualifier followed bu noun Phr3se*i*e a seouence of 
Qualifier Phrases* The semantics of such Qualifier phrases 
is that the noun of each Qualifier phrase determines the 
noun of the following the Qualifier phrase* Soi' we extract 
the fields represented by each oualifUer phrase and put them 
in the heirarchical tree such that the tree reflects this 
semantics* : 

In the exampler "course" determines the "teacher* » so» 
■cno"<the field correspondins to course) is the son of 
"tno"* Similsrlui' "dept* is son of "cno"* The heirarchical 
structure is shown in Fis 5*10 

The second part recognises instantiated nouns* Nodes 3 
to 4i' 4 to 5r S to 6 to 7r and 7 to 3 constitut.e this 
part* An instantiated noun indicates that a noun is 
instantiated to a value* The following example illustrates 
this* 

"who is teaching cname * (data structures) in the dept 
* (computer science) and in the wear = (84)'* 

In the above example the fields 'cname' » "dept" » and 

*' * 

"wear" are instantiated ' (data structures) " » 
» (computer science) 't end *<84>'* The parse of the above 
Querw is shown in the P'ie >, 
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This F>3rt recognises the strings like "noun ehrese'y 
"compare* y end "noun Phr 3 se<cont 3 iniriS 3 proper noun)** 
This strind instantiates the noun in the noun Phr 3 se to ■' the 
value indicated ba the proper noun oT the second noun 
Phrase* 

To build the heirarchicsl tree for the ouera shown in 
Fid 5*11 we to addy as childreny all the nodes correspondins 
to the instantiations* In the abpve example y "tno* is to be 
determined from the five instantiated fields* The semantics 
is maintained in the heirsrchical tree ha puttins all the 
instantiated fields as sons of the needed field* The 
heirarchical tree of the above ouera is shown in Fid 5*12 

Thus we see that instantiated noun structures do not 
contribute to the depth of the treey insteady thea 
contribute to the breadth* . 

A practical example consists of both the parts as shown 
below* 

1 

I 

*who is the teacher of the course in aear ^ 84 offered 
in the dep of sname ® kumar" 

The extraction of heirarchical structure from such 

' ‘ ! 

Queries follows both tha m«thh^» ‘described above* This is 
described in the, following ;5ee.ti.Oh*' ■ 
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5*4*3 COMPLETE EXAMPLE 

We shdll iHustrsL© both the ebove steps ba extrsctin^ 
the needed fields and the heirarchical tree for a tapical 
Quera* 

The example we consider is shown below* 

"who is the teacher of the course in aear = 83 offered 
in the dept of the sname = kumar" 


The parse is shown in the Fis 5*13 

The first three structures of the parse decide the 
needed field* As explained in the section 5*4*1)' the needed 
field has to be found from th(|lnoun of the NP followins the 
verb phrase* This is found from the parse to be "tno"* 

This needed field is kept as root node of the 
heirarchical tree* The remaining ouera from which the 
fields are extracted is 

"of the course in the aear ♦*♦♦♦" 

* 

The first noun "course" belonss to the first tape of 
structures namelas the Qualifier phrase tape of structure* 
So the field represented ba "course" is added as a son of 
the latest non leaf 

heirarchical tree, ], .. . 
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Now the Querw to be processed still is 
•year = 84 offered in ♦ ♦ ♦ . * 

The noun 'year* is a part of the instantiation. Soi* it 
is attached to a child of the previous non— leaf node» but we 
reiiieber that latest non— leaf node is still "cno* and any 
fields found further are to be attached to "cno‘ but not to 
•year*. The field represented by the noun "year* is found 
from the parse and the node indicating the instantiation is 
added to the above formed heirarchical tree Which is now as 
shown in Fia 5.15. 

The remainina Query consists of a Qualifier phrase 
followed by the instantiation. So^ the final tree structure 
constructed is as shown in Fia 5.16* 


5.5 DEREFERENCING AND DESCOPING 

For Queries consistina of more than one clause which 
is commons the SC and the COHP-Cl. refer to nouns of 
precedina clauses. To determine the needed field of each of 
these clausesCnamely SC and COMP-CL) we must identify the 
referents of these clauses. This is what we call 


dereferencina. 
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Also» when more ■then on© clsus© is theref it is Quite 
likely thet^ 3 field is mentioned as externsl in one clsuse 
but? the way of finding is specified in 3 Ister clsgse. We 
eliminate this forward reference to facilitate the Query 
processor (which takes the formal Query from our system and 
interacts with the DB)* This is achieved by reordering the 
local^tables * Another reason for reoVderinS the local-table 
structure is to identify those clauses which tosether 
determine a field and to Sroup those clauses together* This 
process of reordering the local-tables is called descopins* 

We shall illustrate with an example* 

"who tauSht a course which is offered in dept = 
(computer science) and which is offered in year » (84) and 
who is in dept » (electrical engineer ins) • ♦ I 

The Query contains 4 clauses* The first one is MC* 
The second one is SC* The third and fourth bre COMP-Cts* 
The parse of the Query is shown in the Fis 5*17; 

The structure of the above Query (in terms of clauses) 
can be put as below* i 

MC SC COMP-CUl C0W‘'Ct2 
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The needed field of SC csn be found from the fact that 
an SC always refers to the last noun of the immediately 
precedina clause. So» the referent of the SC is “course* 

(the correspondina field beina "cno*). The referents of 

comP”Cll and cobip-c12 are to be found by lookina backwards 
at the needed field of each clause. The referent of a 
cotnp~cl is the needed field of the nearest clause whose 
needed field could fit to be the needed field of the 

coniiP~cl. In the above example the referent of comp-cll is 

the needed field of sc rather than the needed field of MC 
becausef the needed field of MCj- the field *tno“? does not 
fit to be the needed field of COMP-CLl (the strina “which is 
offered..* in comp-cll suaaests that the referent is a 
course rather than a teacher). Similarlyy the needed field 
of C0MP-CL2 is found to be "tno*. After dereferncinar that 
iSK after the needed fields of the clauses are found? the 
heirarchical structure is extracted and the local-tables are 
formed. The local-tables are as shown in Fia 5.18. 

Thus? the rules to be followed to dereference are as 
follows. 

For an SC? the lest noun of the immediately precedina 
clause aives the referent. 
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The COMP-CLs ere dereferenced b« usin^ standard 
block-structure referencing rules* This is done es follows* 
Since e COMP— CL refers to a noun already referred by another 
clauses to dereference a COMP-CL«. we start lookinss 
backwards* The needed field of the nearest clause which 
also fits to be the need field of the present COMP-CL is 
taken the referent of the COMP-CL* Whether it fits or not 
is decided from the interrogative pronoun and verb phrase of 
the COMP-CL. 

5.5*1 DESCOPING 

In the local-tables of Fis 5*18r the second and third 
local-tables together determine the 'cno" whiley the first 
and fourth local-tables together determine the 'tno* field* 
Also the first local-table has an external field *cno"» 
findinS of which is specified in the next two local-tables, 
corresponding to SC and COMP-CLl* Soy to achive bothy the 
elimination of the forward reference and and to Sroup 
together the local-tables which describe essentially the 
same fieldsy we do the descopind. 

The first step to be done in descopinsT is to Sroup 
together those local-tables which are the innermost and put 
them at the bedinnifid of the rebf^ejr-'list* jpw innermost we 
mean that those local-tablos which have no eKtefnal 
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ref€r©nce arid which ‘to^ather deteririine the satrie field* 

In the above exaiHFle we see that secorid and third are 

the innerfiftost ^roup of locale-tables because they have no 

external fields and they together determine the same need 

* 

field "cno*. Sot ue keep these two Ideal-tables together* 

The second step of descopine is *to find the next hieher 
droup of clauses and to place them beneath the previous 
droup in the reorder-list* Bu 'next hidher" we mean those 
droup of clauses which refer to the same field and which 
have external fields)* if ana? determined ba the previous 
droups of clauses* This step is applied until all the 
local-tables are exhausted* 

In the above example we see that MC and C0MP-CL2 have 


the same 

needed 

field 

and also 

their 

external fields 

(actuallwr 

onlw 

HC 

has an 

external 

field 

"cno" which 

is 

determined 

by 

SC) 

determined bs 

.the 

previous droup 

of 


local-tables* Sor we keep this sroup todether and beneath 
the previous droup* With this reorderindt* the local-tables 
are as shown in the Fid 5*19* 

The final step is to put each droup of local-tables in 
a sindle list tadded with *intersect'* This- is done to 
indicate the fact th»t found bs ; intersection 
of the answers of both the local-tabjidr In the above 
exa»'Pl#» ' ■ih'' Fid' 5*'$.?? tb® first- two .-vtiauses todether 
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determine the *cno*+ Thusir "cno“ to be found must satisfy 
the first local-table as well as the second local-table* 
So» we indicate this b« keepins them in a sinsle entity 
tadMed with "intersect" ♦ Similarly the third and fourth are 
tedded* The final form of local-tables are shown in Fid 


S.20 



CHAPTER 6 


SEMANTIC GRAPH DRIVER AND FORMAL QUERY GENERATION 

6*i INTRODUCTION 

The output of the local-table saenerator gives the 

% 

fields neededf the fields given and the heirarchy existing 
eihong the various fields* This output does not contain such 
information as what relations are to be choseni- what 
operations like selection)- projection!- and Join are to be 
performed on the identified field's. 

The semantic graph driver gives such information in the 
form of a Formal Query* This process of generation of 
formal euerw is dependent on the relational structure)- 
linking information and the key fields of the relations of 
the data base** This information of a DB is abstracted into 
a graph which we call Semantic Graph and the semantic graph 

driver takes the loc8l”t3ble and produces the formal euery 

1 

using this graph* 

* ' 

i 

, I 

6*2 DB DESCRIPTION 

The example DB used in our system is sh^wfij In Fig 6*1* 
The first three^ ■' 

ere linking r©latidn%l^t^s^ggn^;;f;.fe!^'® '■ 

fields'^of . the 
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Teacher* ♦Teacher number (key) » teacher name» teacher 

dept 

ft 

Course ♦♦ course number (key)? course name? dept 
offering the course 

Student ♦♦ student number (key)? student name? student 
department 

Of'fer ♦* course number? Teacher riumber(of the one who 

offers the course)? 

wear and semester of the offerins* 

Creditn course number? the student takins the course? 
year and sem 

* 

of the crediting? 

6*3 FORMAL QUERY SYNTAX 

The formal ouery is repTesented in relatiohal alsebra* 
' We illustrate with , an . 

CocusidST. /. 
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"who is the tescher of dsF't = <comPuter science)" 

The local— tsbl^Ae strructure of this ouery is shown in Fi 
6.2 <3). The formal ouers is given in 6.2 <b). We see that 
the syntax is almost the same as relational algebra. The 
formalism is 

''<<fname> = <rel3tionel algebra exeression for fname>)* 

where <fnsme> is the field needed bw a clause. If a clause 
of the auery has external field, then that field is keet as 
"ext" in that clauee. but the relational algebra expression 
to get that external field ereceds this clause. Similarly, 
if more than one clause together determine a field, then the 
relational algebra expression to get such a field is given 
as an "intersection" of these clauses. Fig 6.3 illustrates 
this* The second and third clauses of Fig 6.3<a) together 
determine a field<*cno") and this field is external to the 
first clause. The formal auerw is represented as shown in 
Fig 6.3<c>. Pig 6.3<b) shows the local-tables after 
dereferencing and descoping. , 

6.4 GRAPH 0ESI6H " ^ ^ 

The graph driver program takes two fields, called 
candidate' fields. ione- is -fcnowh. and the other is 

reawired*.' It 
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•primitives of formsl Query" which indicate what 
relations are to be chosen and what fields are to be 
Joinedf selected or proJectedf etc* By performing this 
primitive step iteratively? we cdn find the needed field 
from the aiven fields of the local-table. 

The syntax of the formal ouery is described earlier* 
The desian of the arsph which is used to produce the formal 
ouery is discussed here* 

Some times the Joins which must be performed to find 
one candidate field from the other are obscured in Enalish 
ouery and so? are not available in local-table* Such 
missina Joins have to be inserted by the araph driver* 

Ue illustrate with an example* 

•who is the teacher of Cname - (data structures)". 

The needed field "tno" can be found from the “cname* t 
as follows* 

Instantiate the field "cname" to "(data structures)"* 
Select the tuples of the rtlaliOfi "course* which have 
"cname* as instant! atfdfond^p "coo •from these 
tuples* Join thesf W "cno* 
field* Then project tt^: "trio* offer 

so obtained :in' Uxe ' 
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In brief » 'tno* is found from the "cnsme" by Joinins on 

the field "cno* into 'offer* relation* This Join is 

obscured in the Query ss well as in the locsl-table* Such 

insertions of the missins Joins have to be done by the 

» 

alr»ph~driver process if its candidate fields so demand* 

However* identification of missins Joins is not unioue* 
In the above case* we can find "tno* form "cname* by Joinins 
on the field *dept* in the relation "teacher* and then . 
proJectinat "tno* field* But such an insertion of the Join 
would mean* 

•who is the teacher in the dept offering cname = (data 

structures) * ♦ 

The underlined Join is inserted in this second 
interpretation* But this interpretation does not mean what 
the Query originally is intended to mean* The Query assumes 
a Join on *cno* in the "offer* relation whereas the later 
interpretation is making a Join oh *dept* fieldt 

Thus not only an 'i:frsi@rti'Oh' .of; missing Join* but also an 
«,*^ropriate - 


process* 
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A larae potential for such an ambiauity is in 
prepositions and verb-Phrases which connect two nouns (the 
two candidate fields)* 

For eSr 

I ♦ . ♦ . . teacher of cname = . ♦ ♦ ♦ 

2» **«*•* teacher teaching cname = ♦♦♦♦♦ 

3* ♦♦♦♦♦♦ teacher of dept = ♦♦♦♦♦ 

The actual meanins of the Queries 1 and 2 is 

"who is the teacher of the cours which has name «= (data 
structures)* 

But such a full specfication of the vioin field is often 
hidden in NL auertes* So r we see that there ^s a need for 

interpretini the path between two nouns that are connected 

’ , ■ ■■■ ■ ! 

bw such concise and terse phrases* 

When a user twpee ,in ® ouery with such terjse phrases he 
assumes that the pfo^fee he I® ihterBctins with is capable 
of . interpreting the ■appropriate i»e* he 

assumes that the profiEram knows, the semantics of the DB* He 
thus uses t.ei'se phrase® to cofinect two nouns whjen he feels 
that the two nouns are vers blosely related to each other 
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and hence cen be interpreted by the proarem. Our clsim is 

9 user uses s terse Phrasetss in li»2i'3 above) between 
two nouns if he feels that the two nouns are very closely 

related from the DB semantics." 

We now show that this close reTationship between two 

Pieces of data is used in the design of a BE. We 
encapsulate this knowledge into a graph and this graph is 
used to insert the missing Joins between the nouns connected 
bw the terse phrases. As we pointed out earlier? the nouns 
of a terse phrase are closely related. Since the graph 
encapsulates this knowledge of closeness from the BBrthe 
insertion of such missing Joins is appropriately done by the 
graph* 

We proceed to show now that the BB design takes into an 
account this property of closeness. Actually there is a 

w*- «W. Wf. MW NtW 

heirarchw of closeness in a BB. . 

The first step we do in a BB design is to decide which 
fragments of data are be pieced together into a relation* 
We invariably keep together the attributes of a basic entity 
into a relation because thew are closely related to each 
other and also they ConceptuaiJy indicate a logically 
complete, entityv%tl^^f:#®^'??; simple and 
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We shall illustrate the desisn of both the kinds of 
relations ba an example* 

In our sample data base of IITK acadamic databaser we 
see that there are atleast three differrit entities 

<i)» courses 

<ii>. teachers 

<iii>« students* 

Thea are three different entitles and ana user sees 
them as three conceptual la differnt chunks of data* This 
accounts for our desisn of three ERs in the CB* 

The next step is the desisn of linkins relation desisn* 
We see that a teacher besides havins a name and a dept also 
offers a course* Thus we see that the two basic entities 

■teacher* and 'course* are related ba offer* So? we keep 

together the keas of all the tuples of both the relations 
which satisfa this aiddlti.onal--'':propefta offer*. 'This is ti^, 

reason for the linking relation offers fields 

indicate some properties of the two kew, fields 
*cno* and *tno* together*, 
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The design of the second linkins relation credit is 
similsr* 


Thus we see thet there is a heirarchy in the design 
froiii the view e.oint of closeness* Ue also pointed out that 
the nouns in a terse phrase are also ‘very closely related in 
General. Sok if we adopt a method which interprets the path 
between two nouns tersed todether^ by checkins alone this 
heirarchy of closeness and decide what Joins are to be 
insertedc the interpretation will be correct in seneral* 

The desidn of our sraph encapsulates this heirarchy and 
the wissinal Joins are interpreted by this graph* The design 
of the graph is as below* 


i *. 

The fields of a DB constitute the nodes of the srsph* 


Two nodes are connected bw an arc if they are Joinable or f 
one is the key and the other is an attribute of, a relation* 
These arcs are undirected* Each ere is given ^ weight which 


indicates the reative cloeseness of the fields connected by 
the arc* The wore, the' cTpsfngss.. the ' les»; ;the weight* ' Thb'.' 
closest fields namely the fields of a relation are given the 

lowest weight* . fhe,pakt:\cletffKpfpg bet^gfh;,k!eypr-0 entity; 
relations and thd linking relations are given the next 
higher weight* Jtm other Joinable arc is given the highest 
weight* 




TIC 
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f 

The lowest cost eath Sives the most meaningful path 
between the fields of a terse phrase* Once the path is 
foundf then it is very easy to identify what uoins> 
selections» and projections are to be carried out* 

The Sraph obtained by usinS the above scheme to our 
sample data base is shown in Fid’ 6*4* The sraph search 
nethod to find the least weighed path is the same as the one 
iiscussed in "Principles of AI"nNil833* Ule conclude this 
chapter with an example* 

CJUHFtETE EXAMPLE* 

s4iall illustrate the , use of the sraph with the 
i^altQwinSf ®uery* 

^ " ■■ i’’ 

“who is the teacher of sname “ kumar** 

The local-table is shown in FiS 6*5* [ 

The candidate fields are "TNO* and "SNAM^I** We see 
from the Fid 6*4* that there are more than onej path between. 

i 

I 

these fields but* the. least cost path is j threudh the 


“credit" relation 





Me) 

T K)P) 

C T N O Qt <=> Ysanoji 


^Uinft (?.))') 


C EO 
C vfv es 

w\i.) ^ 


6* 6* 


lOCAL TA8t,E OP THE ^UE-KY' 

' COHo UO Ai?a«i\CA ei ‘S.nOwP e 



p:¥*' ^ . 


pRO'iiecT cMo 

(^Tom ‘^'^o 

C<’^o:rEcT "s*NO C-^rs3kf(^s ■^uwxaR') 
‘S.TUDENT^ 

CREDIT)^ 


OFFER. 




F\gG*6* FoRtAKtQueEY 


OF . FiQ 



6-12 


The psth is «s follows* First the tuples of ■student* 
reletion which heve "snsme* as instantiated above are found* 
Thenf their "sno* fields are projected* These fields are 
Joined with the ‘sno* fields of the relation "credit* and 
the *000* fields of the tuples of "credit" so obtained are 
projected* ' Thenr these *000" fields are Joined with the 
relation "offer* on •cno" field* The' "tno" fields of the 
tuples so obtained are projected* These fields provide the 
reaueired answer* 


Thus* we have interpreted the above auers as follows* 
"who Reaches the course Which is taken b« the studentw 


.ss./kuwar",* 

'if-:*-' ■ ■ • . 

’'■V: ' 


l^^rj^upi^erliried portion Sives the path inserted by the sraph* 

■' This is what is eKpected bw the user aleo* The other path 
is alons the "dept" field of the two relations "student* and 
"teacher"* But this interpretation does not Sive the 
meaninS intended by our ewaniple Query* Thusy we see that 
the sraph inserts the path between two nouns of a terse 
Phrase in an appropriate way* 

The formal auery of the auery of Fid 6*5 is shown if 


Fi^ 6*6 



CHAPTER 7 


CONCLUSIONS 

7*1 SUMMARY 

Oup ni3tuP3l lsr»au3sE© interfsce system hss four r-hasesf 
Nor»3iis3tion» Psrsin^y Locsl-Tsble Generstionv 3 nd Formal 
<»u«t*w Senerstion* 

The first phases Normslisstionp identifies the field 
##*rriptiqwta . and , stifestitutes these with csnnonicsl 
representation* Various descriptions of each field are 
aiven in terms af an ATN aremmar* On recosnition of each 

on the actions sssociated with the 
ATN replace the description with cannonical 

The second phase? parsinS? uses an ATN parser to 
produce the parse of the Normalised euers* The dstehaa© 
specific information? if any? Of each word is kept along 
with the word in the parse tree* 

The third phase? Local-Table Generatiohr produces a 
auery in terms of what fields are needed and what fields are 
specified* We call these Queries skeleton eyeries* fhd 
database specific information reouired for this purpose is 
taken from parse structur©,# The heuristics of this phase 
use this information td produce the formal auery froiiii the 




psrse a^ruct^ursf* 


The fourth phase? produces the final formal Quer« in 
relational alSebra usina the semantic sraph driver. It 
identifies uhat relations have to he used? what operations 
like Join? select and project are to be performed. The 
information reouired hn this phase is available in the 
Semantic Graph. This semantic Sraph is desisned from the 
database for which the system works as an Interface. 

The system runs on NLISP interpreter on our KL-10 

processor. 


CONCLUDING REMARKS 


we have designed and implemented a system that provides 
Interface to . a relational database.’ 


: Jt ^ ii"i? as parameters? the database specification 


“’ and produces the inerface for the database. Although the 
whole process is not automated? the database specif icatlon 
is formalised. Translation of this database specification 
into system parameters is alsorithmised and sided bw a 
number of interactive functions. 


The various characteristics of the database that must 
be used to port the system to a new database are- as below 



1* The descriptions of various database fj 
descriptions can conveniently be described 
drammar» 


ields* These 
by an ATN 


The database specific nouns and Iverbs* 


The 


database specific information associated with Icertain nouns 
and verbs is used in the third phase. The information that 

j 

must be provided is standardised and a set of macros are 

■ . - ■ . J 

defined to this into the lexicon i 

i 

i ' 

3* The relationsy their keys and the Joiniable fields, 
this Information is used to construct the semantic sraph. 
_fh» May the can be constructed from this 

informatior* is aisorithmised* 


■ ... 




J‘hiis» oyr basic objectives are achieved in desisninS a 




e Interface system that takes database 


hi5' ftd'"' ’ 


ion as parameters. 


7.3 FUTURE EXTENSIONS 

7.3.1 ELLIPSIS 

To understand the meanins of ellipsis^ consider the 
following dialogue with a NLI system. 



"who iauuht cs4tS* 

■karrjici'* 

auervl "and c«A25'' 

Th« «»cond 0)U»rv BsummtUk th» kiiiotpil»dtt» of th« iD*r»yt' 0 <uy 

' » V 

atu«rw« Such auerioft which do r*ot ttF'actfw thm .4nf«iri!»»Ciori 
cofM>l«teJiw and assuMO fio«« inforfuaiion froA iho' rrovious 

wV- 

ouerw are »aid to have ellipsis. Our suatee doe* noil hendie 
elliAois* It can be #Ktend©d to handle this* 

, ■ . ■ ■ ' '» ■. ' ' -M ■ _ 

M rm mmim 

The ewetee eresentlv acceeta •Wh* oueriea onlv. Tti« 
etmie* can be extended to handle non *llh* euerie* 
we»/no oueries and feeuestind ouerieee* For thie* tlw 
draaear has to be ©xtenideci and aleb the fie|d extract tor 
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routines of local -table apneraiion have to be ehahaed 

7.3*3 hNAPMOPtIC REFERENCE . 

If certain words of a sentence refer to what was diver 
earl.ier in the sentence* then such sentences are said .t< 
have anaphoric reference* Consider- 
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*who took cs415 from th® teacher who taught him cs325 
in 83* 

In the ®u®ry abovoi' "him" refers to "who took cs415"j» a part 
of the Query* Otir system does not handle such references* 
It can he extended to handle such references also* 
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VERB -PH rase: 


QP- Ath-3) 






AT H~ 4') 




pf\ss\vi: voice; (^p- atn- 5 ) 



(done) 


0U.UAUF1ER*. 



(“PARSE AC- e^uA) 


RSE pS-e-UA") 







. ( p- ATt^ -7) 

O ^RP PREP) 



(WQPd 


(^oond) 



(OtHi-) 


Ac-auA: CfrJiiiixli 

( \/\0 


^C- PRFP) 



(done) 
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